472,965 Members | 1,969 Online

# Comparing lists

I have to lists, A and B, that may, or may not be equal. If they are not
identical, I want the output to be three new lists, X,Y and Z where X has
all the elements that are in A, but not in B, and Y contains all the
elements that are B but not in A. Z will then have the elements that are
in both A and B.

One way of doing this is of course to iterate throug the lists and compare
each of the element, but is there a more efficient way?

--
Har du et kjøleskap, har du en TV
så har du alt du trenger for å leve

-Jokke & Valentinerne
Oct 10 '05 #1
41 3859
Odd-R. wrote:
I have to lists, A and B, that may, or may not be equal. If they are not
identical, I want the output to be three new lists, X,Y and Z where X has
all the elements that are in A, but not in B, and Y contains all the
elements that are B but not in A. Z will then have the elements that are
in both A and B.

These are set operations.
One way of doing this is of course to iterate throug the lists and compare
each of the element, but is there a more efficient way?

Maybe, using sets?

L1 = [1,2,3,4]
L2=[3,4,5,6]
diff1 = list(set(L1)-set(L2)) # [1,2]
diff2 = list(set(L2)-set(L1)) # [5,6]
symdiff = diff1+diff2 # Symmetric difference [1,2,5,6]
intersect = set(L1+L2) - set(symdiff) # Intersect [3,4]

Best,

Les

Oct 10 '05 #2
try to use set.
L1 = [1,1,2,3,4]
L2 = [1,3, 99]
A = set(L1)
B = set(L2)

X = A-B
print X

Y = B-A
print Y

Z = A | B
print Z

Cheers,
pujo

Oct 10 '05 #3
<aj****@gmail.com> wrote in message
try to use set.
L1 = [1,1,2,3,4]
L2 = [1,3, 99]
A = set(L1)
B = set(L2)

X = A-B
print X

Y = B-A
print Y

Z = A | B
print Z

But how "efficient" is this? Could you be a bit
more explicit on that point? What is the order
of complexity of set([...]) or of A-B, B-A,
A | B, A ^ B and A & B? - The Python Documentation
leaves me completely in the dark in this regard.

Sorting the two lists and then extracting
A-B, B-A, A|B, A & B and A ^ B in one single
pass seems to me very likely to be much faster
for large lists.

Regards,
Christian
Oct 10 '05 #4
"Christian Stapfer" <ni*@dev.nul> wrote:
<aj****@gmail.com> wrote:
try to use set.

Sorting the two lists and then extracting
A-B, B-A, A|B, A & B and A ^ B in one single
pass seems to me very likely to be much faster
for large lists.

Why don't you implement it, test it and time it to be more convincing about your intuition ?

George
Oct 10 '05 #5
On Mon, 10 Oct 2005 14:34:35 +0200, Christian Stapfer wrote:
Sorting the two lists and then extracting
A-B, B-A, A|B, A & B and A ^ B in one single
pass seems to me very likely to be much faster
for large lists.

Unless you are running a Python compiler in your head, chances are your
intuition has no connection to the actual performance of Python except by
pure coincidence.

Write some code, and time it in action.

In fact, here is some code I've written for you, complete with timer. Do
yourself a favour, and run it and see for yourself which is faster.

Then, if you think you can write a list version that is more efficient
than my (admittedly very inefficient) version, please do so. Then time the
difference again. Then think about how much time it took you to write,
test and debug your list version, compared to my four lines using sets.
from sets import Set
import time

def compare_and_separate(A, B):
only_A = []
only_B = []
both_AB = []
for item in A:
# ignore items we've already seen before
if item in only_A or item in both_AB:
continue
if item in B:
both_AB.append(item)
else:
only_A.append(item)
for item in B:
# ignore items we've already seen before
if item in only_B or item in both_AB:
continue
only_B.append(item)
return (only_A, only_B, both_AB)

def compare_and_separate_with_sets(A, B):
only_A = list(Set(A)-Set(B))
only_B = list(Set(B)-Set(A))
both_AB = list(Set(A+B) - Set(only_A+only_B))
return (only_A, only_B, both_AB)

A = range(199) + range(44, 300, 3) + range(250, 500) + range(2000, 2100)
B = range(1000) + range(3000, 4000)

def tester():
# confirm that the two functions give the same results
r1 = compare_and_separate(range(5), range(3,8))
r2 = compare_and_separate_with_sets(range(5), range(3,8))
print r1
print r2
if r1 == r2:
print " Same."
else:
print " Warning: results are different."
loopit = range(20) # repeat the test 20 times for timing purposes
t1 = time.time()
for i in loopit:
results = compare_and_separate(A, B)
t1 = time.time() - t1
t2 = time.time()
for i in loopit:
results = compare_and_separate_with_sets(A, B)
t2 = time.time() - t2
print "Time with loops:", t1
print "Time with sets:", t2

--
Steven.

Oct 10 '05 #6
"George Sakkis" <gs*****@rutgers.edu> wrote in message
news:1128952417.0544cc73190bbbd4e683448c137969c8@t eranews...
"Christian Stapfer" <ni*@dev.nul> wrote:
<aj****@gmail.com> wrote:
> try to use set.

Sorting the two lists and then extracting
A-B, B-A, A|B, A & B and A ^ B in one single
pass seems to me very likely to be much faster
for large lists.

Why don't you implement it, test it and time it

The problem is in the generation of the test data.
Even merely generating a set of (suitably "average",
"random", and suitably "worst case") datasets might
turn out to be a major undertaking.
If the documentation stated the order-of-magnitude
behavior of those basic operations up front, then
I (and *anyone* else who ever wanted to use those
operations on large lists / large sets) could do
a quick order-of-magnitude estimation of how
a certain program design will behave, performance
wise.
*Experimenting* is not necessarily as easy to
do as you seem to believe. How do you, for example,
hit upon the worst-case behavior with your test
data? - Without knowing *anything* about the
implementation it might a matter sheer luck.
If you *know* something about the implementation
then, of course, you might be able to figure it
out. (But note that if you know *that* much about
the implementation, you usually have an order-of-
magnitude estimate anyway and don't need to do
*any* experimenting in order to answer my question.)

Regards,
Christian
Oct 10 '05 #7
Christian Stapfer wrote:
"George Sakkis" <gs*****@rutgers.edu> wrote in message
news:1128952417.0544cc73190bbbd4e683448c137969c8@t eranews...
"Christian Stapfer" <ni*@dev.nul> wrote:

<aj****@gmail.com> wrote:

try to use set.

Sorting the two lists and then extracting
A-B, B-A, A|B, A & B and A ^ B in one single
pass seems to me very likely to be much faster
for large lists.

Why don't you implement it, test it and time it

The problem is in the generation of the test data.
Even merely generating a set of (suitably "average",
"random", and suitably "worst case") datasets might
turn out to be a major undertaking.
If the documentation stated the order-of-magnitude
behavior of those basic operations up front, then
I (and *anyone* else who ever wanted to use those
operations on large lists / large sets) could do
a quick order-of-magnitude estimation of how
a certain program design will behave, performance
wise.
*Experimenting* is not necessarily as easy to
do as you seem to believe. How do you, for example,
hit upon the worst-case behavior with your test
data? - Without knowing *anything* about the
implementation it might a matter sheer luck.
If you *know* something about the implementation
then, of course, you might be able to figure it
out. (But note that if you know *that* much about
the implementation, you usually have an order-of-
magnitude estimate anyway and don't need to do
*any* experimenting in order to answer my question.)

You are, of course, either assuming that there's a single implementation
of Python, or that all implementations have the same behaviour. Or
alternatively you are asking all implementers to do what you seem to
consider so difficult (i.e. identify worst-case scenarios and then
estimate order-of-magnitude behaviour for them).

Test results with known test data are relatively easy to extrapolate
from, and if your test data are reasonably representative of live data
then so will your performance estimates.

Anyway, aren't you more interested in average behaviour than worst-case?
Most people are.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/

Oct 10 '05 #8
"Steve Holden" <st***@holdenweb.com> wrote in message
news:ma*************************************@pytho n.org...
Christian Stapfer wrote:
"George Sakkis" <gs*****@rutgers.edu> wrote in message
news:1128952417.0544cc73190bbbd4e683448c137969c8@t eranews...
"Christian Stapfer" <ni*@dev.nul> wrote:
<aj****@gmail.com> wrote:

>try to use set.

Sorting the two lists and then extracting
A-B, B-A, A|B, A & B and A ^ B in one single
pass seems to me very likely to be much faster
for large lists.

Why don't you implement it, test it and time it
The problem is in the generation of the test data.
Even merely generating a set of (suitably "average",
"random", and suitably "worst case") datasets might
turn out to be a major undertaking.
If the documentation stated the order-of-magnitude
behavior of those basic operations up front, then
I (and *anyone* else who ever wanted to use those
operations on large lists / large sets) could do
a quick order-of-magnitude estimation of how
a certain program design will behave, performance
wise.
*Experimenting* is not necessarily as easy to
do as you seem to believe. How do you, for example,
hit upon the worst-case behavior with your test
data? - Without knowing *anything* about the
implementation it might a matter sheer luck.
If you *know* something about the implementation
then, of course, you might be able to figure it
out. (But note that if you know *that* much about
the implementation, you usually have an order-of-
magnitude estimate anyway and don't need to do
*any* experimenting in order to answer my question.)

You are, of course, either assuming that there's a
single implementation of Python,

Of course not!
or that all implementations have the same behaviour.
Of course not!

But it is reasonable, anyway, to ask for information
about a specific implementation (that one is *forced*
to use). Performance *will* depend on it, whether we
like it or not, whether that information is given by
the implementer or not.
And if the implementer wants to complain that
giving such information would break his wonderful
abstraction then I can only answer: It is *reality*
that *will* break your abstraction as regards
performance! It is, therefore, absolutely *no*
use to *pretend* you *can* avoid this form of "breaking
the abstraction" by simply avoiding to mention it
in the documentation...
Or alternatively you are asking all implementers
to do what you seem to consider so difficult
(i.e. identify worst-case scenarios and then estimate order-of-magnitude
behaviour for them).
I consider it the job of the implementer to know
choosing one particular implementation, and to
know what computational complexity therefore
attaches to the various operations exposed in
its interface. Why should *every* user of his
module be forced to either read the source
code of his implementation or try to figure
it out experiment-wise on his own?
Test results with known test data are relatively
easy to extrapolate from, and if your test data
are reasonably representative of live data then so will your performance
estimates.
How reasonable is it to ask me, or anyone else
for that matter, to extract, experiment-wise
(if it can be done at all with reasonable effort)
information that properly belongs to the implementer
and really should have been exposed in the
documentation in the first place?
Anyway, aren't you more interested in average
behaviour than worst-case? Most people are.

It depends. *I* am NOT the OP of this thread.
to do what he needed to do. There are many
measures of efficiency. Even "programmer efficiency"
may be one of them. But I assumed that, since
the OP took the time to ask his question in this
NG, that it really was about "computational
efficiency". Then he was offered solutions -
but they were offered just matter-of-factly.
*Surely* those who had posted those solutions
would be able to answer my question *why*
it was, that they *assumed* conversion into
sets to be more efficient than operating
on the lists in the first place. I was *not*
at all criticizing anyone (except, perhaps,
the Python documentation for its lack of
*insightful* information): I was just asking
for a *small* clarification that I *assumed*
(mistakenly it now appears) others could
*easily* provide.

Regards,
Christian
--
"Those who cannot remember the past are
condemned to repeat it."
- George Santayana
Oct 11 '05 #9
Christian Stapfer wrote:
"Steve Holden" <st***@holdenweb.com> wrote in message
news:ma*************************************@pytho n.org...
Christian Stapfer wrote:
"George Sakkis" <gs*****@rutgers.edu> wrote in message
news:1128952417.0544cc73190bbbd4e683448c137969c 8@teranews...
"Christian Stapfer" <ni*@dev.nul> wrote:

><aj****@gmail.com> wrote:
>>try to use set. A reasonable suggestion. A set must have the trade-offs involved.
The abstraction itself brings to mind the issues, and the performance
can, at least in theory, be handled there. If that is true (that
the "set" abstraction "sees" the problem), then you can rely on the
Python implementation of "set" to either now, or eventually, have
a "good" implementation -- one not too far off efficient. The Python
gang is good at this stuff; a "better" set implementation will win if
it can show better performance without related down-sides.

As to the "either now, or eventually;" if you _must_ have performance
now, not in some abstract future, then it behooves you to _test_,
_test_, _test_!
If the documentation stated the order-of-magnitude
behavior of those basic operations up front, then
I (and *anyone* else who ever wanted to use those
operations on large lists / large sets) could do
a quick order-of-magnitude estimation of how
a certain program design will behave, performance
wise. And, if the proper documentation is in place, and it
says "dictionary lookup is O(N)" (and you avoid using
it for exactly that reason), how surprised will you be
to discover that the O(N) is only reached if the hash
values of the keys are all equal?

Oh, maybe you expect "O(N)" to really mean "\Theta(N)".
Then, if you are a dweeb like me, you will respond that
"This is not possible, a dictionary of size N must take at
least 'O(lg N)' to read the key, never mind processing it."
But, it turns out, that given a bound on the size of a
process, processing an address is "O(1)", not "O(lg N)".
Is that too practical for you, or not enough?
*Experimenting* is not necessarily as easy to
do as you seem to believe.
How do you, for example, hit upon the worst-case
Are you saying the documentation should characterize the
cases that achieve worst-case behavior? This is a stiff
burden indeed, not one I am used to in even the most rigorous
classes I've seen. If there is such a characterization,
how burned will you feel if a case is overlooked and that
case is the one that you sold to your customer? Are you
willing to provide the same guaranteed performance and
documentation of performance to your customer that you
you expect of the Python system? Would you mind if the
quality is proportional to the price you paid?
You are, of course, either assuming that there's a
single implementation of Python, Of course not!
or that all implementations have the same behaviour.

Of course not!

But it is reasonable, anyway, to ask for information
about a specific implementation (that one is *forced*
to use). You are not _forced_ to use any implementation of Python.
You are free to implement your own Python system.
And if the implementer wants to complain that
giving such information would break his wonderful
abstraction then I can only answer: It is *reality*
that *will* break your abstraction as regards
performance! It is, therefore, absolutely *no*
use to *pretend* you *can* avoid this form of "breaking
the abstraction" by simply avoiding to mention it
in the documentation... I simply repeat: I have never seen a paper characterizing
the performance of an algorithm as "O(expr(N))" that described
in details _all_ cases that reached that limit. At most such
papers describe _one_ such cases. Nor have I ever seen papers
describing the performance of an algorithm as "\Theta(expr(N))"
that characterized the cases that broke the "\Theta" performance.
If you provide me the papers, provide me a C compiler with
equivalent docs on all C expressions, and provide me the funding
to update the Python docs, I will be happy to do so for a single
version of Python and a since version of CPython. I expect I
will have an easier time of it than the IronPython people will
have.
I consider it the job of the implementer to know
choosing one particular implementation, and to
know what computational complexity therefore
attaches to the various operations exposed in
its interface. Am I to take it you provide this to all of your customers?
How reasonable is it to ask me, or anyone else
for that matter, to extract, experiment-wise
(if it can be done at all with reasonable effort)
information that properly belongs to the implementer
and really should have been exposed in the
documentation in the first place?

Not at all reasonable. How reasonable is it to ask

--Scott David Daniels
sc***********@acm.org
Oct 11 '05 #10
"Scott David Daniels" <sc***********@acm.org> wrote in message
news:43******@nntp0.pdx.net...
Christian Stapfer wrote:
"Steve Holden" <st***@holdenweb.com> wrote in message
news:ma*************************************@pytho n.org...
Christian Stapfer wrote:

"George Sakkis" <gs*****@rutgers.edu> wrote in message
news:1128952417.0544cc73190bbbd4e683448c137969 c8@teranews...
>"Christian Stapfer" <ni*@dev.nul> wrote:
>
>><aj****@gmail.com> wrote:
>>>try to use set. A reasonable suggestion. A set must have the trade-offs involved.
The abstraction itself brings to mind the issues, and the performance
can, at least in theory, be handled there. If that is true (that
the "set" abstraction "sees" the problem), then you can rely on the
Python implementation of "set" to either now, or eventually, have
a "good" implementation -- one not too far off efficient.
It is, unfortunately, not infrequently the case
that there are different implementations for
a given data type that are efficient in different
ways - but that there is no one single implementation
that is the most efficient in *all* regards.
Thus the implementer must make a trade-off,
and that trade-off has consequences, performance
wise, that he might want to tell the users of his
The Python gang is good at this stuff;
I'm not denying it. The question is about
telling users upfront what the consequences
are (roughly) of the trade-offs that have
been made so that they can use the modules
that have been provided more effectively.
a "better" set implementation will win if
it can show better performance without
related down-sides.
Is there ever such a thing? I always
thought that, most of the time, there
is no such thing as a free lunch in
this field.
As to the "either now, or eventually;" if you _must_ have performance
now, not in some abstract future, then it behooves you to _test_,
_test_, _test_!
Well, I might want to test: but *first* I want
to design, preferably in "armchair style"
- at least as regards the basic approach
that I want to take. This "armchair stage"
involves not infrequently the use of rough
complexity measures to exclude the clearly
unusable and chose, instead, an approach
that is very likely efficient enough for
what I need.
If the documentation stated the order-of-magnitude
behavior of those basic operations up front, then
I (and *anyone* else who ever wanted to use those
operations on large lists / large sets) could do
a quick order-of-magnitude estimation of how
a certain program design will behave, performance
wise. And, if the proper documentation is in place, and it
says "dictionary lookup is O(N)" (and you avoid using
it for exactly that reason), how surprised will you be
to discover that the O(N) is only reached if the hash
values of the keys are all equal?

It's not that difficult to distinguish
*average* case and *worst* case scenarios.
It might even be a good idea to state,
if that can easily be done, what the
*best* case happens do be...
Oh, maybe you expect "O(N)" to really mean "\Theta(N)".
Then, if you are a dweeb like me, you will respond that
"This is not possible, a dictionary of size N must take at
least 'O(lg N)' to read the key, never mind processing it."
But, it turns out, that given a bound on the size of a
process, processing an address is "O(1)", not "O(lg N)".
Is that too practical for you, or not enough?
I do not expect a developer to expend *inordinate*
amounts of work to figure out the computational
complexity of what he has implemented. But, as I
wrote, he *must* have thought about the matter, and
is thus, by definition, in a rather good position
to provide what he can frequently simply pull from
memory when it comes to documenting the interface
of his module.
*Experimenting* is not necessarily as easy to
do as you seem to believe.
How do you, for example, hit upon the worst-case behavior with your test
data? Are you saying the documentation should characterize the
cases that achieve worst-case behavior? This is a stiff
burden indeed, not one I am used to in even the most rigorous
classes I've seen.

If it happens to be particularly difficult, for a
given implementation (of whatever abstract data type),
then, of course, state this upfront and leave it at
that.
If there is such a characterization,
how burned will you feel if a case is overlooked and that
case is the one that you sold to your customer?
I am not at all a lawyer type, as you seem to
imagine. I just want to suggest that some
(to the implementer - but not the average user)
complexity (especially for the most basic data types)
would be a good thing to have. Nothing more.
Are you
willing to provide the same guaranteed performance and
documentation of performance to your customer that you
you expect of the Python system?
I'm using python mainly as a very handy language to
write whatever (usually relatively small) utility
programs I happen to require for my main job. I am
not (currently) developing any "serious" programs
that are targeted for sale. (Thought the proper
functioning and adequate efficiency of my personal
utility programs really does matter, but it mainly
matters for *me* - and for my "customers", if I
can be said to have any, which seems doubtful,
it only matters indirectly at best.)
Even so, I frequently need to have a reasonable
idea of what computational complexity attaches
to certain operations on certain data types. But
I hardly ever have the time to go into an extended
period of first comparing several different approaches
by way of experiment.
Would you mind if the quality is proportional to
the price you paid?
There is really not need to indulge in polemics like
this. I am just making a suggestion as to what information
might *help* improve the usability of Python modules.
If making such suggestions regularly only led to negative
exchanges of this sort, there would be very little room
for learning in this community indeed...
You are, of course, either assuming that there's a
single implementation of Python,

Of course not!
or that all implementations have the same behaviour.

Of course not!

But it is reasonable, anyway, to ask for information
about a specific implementation (that one is *forced*
to use).

You are not _forced_ to use any implementation of Python.

What I wanted to say is *not*, that it appears to be
a terrible burden to my inflated consumer-ego to
have to use something not *absolutely* perfect in
*all* respects, such as Python. What I wanted to
say, here, is just this: *whenever* a Python program
is run, it will run on some specific implementation
of Python. That much should be obvious.
You are free to implement your own Python system.
Don't be ridiculous.
And if the implementer wants to complain that
giving such information would break his wonderful
abstraction then I can only answer: It is *reality*
that *will* break your abstraction as regards
performance! It is, therefore, absolutely *no*
use to *pretend* you *can* avoid this form of "breaking
the abstraction" by simply avoiding to mention it
in the documentation...

I simply repeat: I have never seen a paper characterizing
the performance of an algorithm as "O(expr(N))" that described
in details _all_ cases that reached that limit.

That's what's usually called "throwing out the
baby with the bathwater". It is you (and only
you) here, who is arguing that if perfection is
not attainable (with reasonable effort), nothing
should be done at all. I as am willing to live
with imperfect module inferface specification as
I am willing with imperfect tools more generally.
But that will not stop me from making *suggestions*
for improvement (not arrogant demands, mind you).
At most such
papers describe _one_ such cases. Nor have I ever seen papers
describing the performance of an algorithm as "\Theta(expr(N))"
that characterized the cases that broke the "\Theta" performance.
To repeat: I was not asking for the impossible, not
dumping of the (to the implementer, but not the average
user) obvious and reasonably certain information
about computational complexity of what operations
are exposed in the interface of a module.
I have seen *many* books on algorithms that specify
computational complexity measures for the algorithms
described. Any really good library of algorithms will
at least try to reach that standard, too.
If you provide me the papers, provide me a C compiler with
equivalent docs on all C expressions, and provide me the funding
to update the Python docs, I will be happy to do so for a single
version of Python and a since version of CPython.
I am not asking you, nor anyone else, to do
any amount of boring or otherwise hard work
here.
I expect I
will have an easier time of it than the IronPython people will
have.
I consider it the job of the implementer to know
choosing one particular implementation, and to
know what computational complexity therefore
attaches to the various operations exposed in
its interface.

Am I to take it you provide this to all of your customers?

I have no customers (lucky me). And I did not
want to pose as the all too frequently encountered
species of the arrogant (and equally ignorant)
customer who expects nothing but perfection
from *others* (but not himself, of course).
I am not at all like that: you are attacking
How reasonable is it to ask me, or anyone else
for that matter, to extract, experiment-wise
(if it can be done at all with reasonable effort)
information that properly belongs to the implementer
and really should have been exposed in the
documentation in the first place?

Not at all reasonable. How reasonable is it to ask

Even providing the most basic information about
the interface of a module, even providing the
mere names of member functions, could be denied
on *that* basis. Thus, this argument fails.
It fails simply because it "proves" too much...

Regards,
Christian
--
»In the long run men hit only what they aim at.
Therefore, though they should fail immediately,
they had better aim at something high.«
- Henry David Thoreau: 'Walden'
Oct 11 '05 #11

To take the heat out of the discussion:

sets are blazingly fast.

Oct 13 '05 #12
Let me begin by apologizing to Christian as I was too snippy in
my reply, and sounded even snippier than I meant to.

Christian Stapfer wrote:
"Scott David Daniels" <sc***********@acm.org> wrote in message
news:43******@nntp0.pdx.net...
a "better" set implementation will win if
it can show better performance without
related down-sides.
Is there ever such a thing? I always
thought that, most of the time, there
is no such thing as a free lunch in

If you look at the history of Python's sort, it has steadily gotten
better. The list implementations has been tweaked to produce better
performance appending and popping. There are a number of such cases.
In fact, as Python rolls along, code keeps getting improved. Usually
the requirement is that the change not degrade current benchmarks and
provide a substantial improvement in at least some practical cases.
As to the "either now, or eventually;" if you _must_ have performance
now, not in some abstract future, then it behooves you to _test_,
_test_, _test_!

Well, I might want to test: but *first* I want
to design, preferably in "armchair style" ... [using]
rough complexity measures to ....

>If the documentation stated the order-of-magnitude
>behavior of those basic operations up front, then
>I (and *anyone* else who ever wanted to use those
>operations on large lists / large sets) could do
>a quick order-of-magnitude estimation of how
>a certain program design will behave, performance
>wise.
I think this is where I started over-reacting. There are
a number of requests here over time by people who state
that things would be so much better for every one if only
someone (never themselves) did some more work that they
might not otherwise want to do. The people who implement
the code often do so on their own time. I find the Python
docs surprisingly good for even commercial documentation.
For work that is done gratis, it is phenomenal. I hesitate
to ask any more of it (although I have pointed out bugs in
docs as I've found them).
And, if the proper documentation is in place, and it
says "dictionary lookup is O(N)" (and you avoid using
it for exactly that reason), how surprised will you be
to discover that the O(N) is only reached if the hash
values of the keys are all equal?

It's not that difficult to distinguish
*average* case and *worst* case scenarios.
It might even be a good idea to state,
if that can easily be done, what the
*best* case happens do be...
Oh, maybe you expect "O(N)" to really mean "\Theta(N)".
Then, if you are a dweeb like me, you will respond that
"This is not possible, a dictionary of size N must take at
least 'O(lg N)' to read the key, never mind processing it."
But, it turns out, that given a bound on the size of a
process, processing an address is "O(1)", not "O(lg N)".
Is that too practical for you, or not enough?

I do not expect a developer to expend *inordinate*
amounts of work to figure out the computational
complexity of what he has implemented. But, as I
wrote, he *must* have thought about the matter, and
is thus, by definition, in a rather good position
to provide what he can frequently simply pull from
memory when it comes to documenting the interface
of his module.

I talked about Big-O (worst case) or Big-Theta (average case)
just to point out that no simple characterization like "O(N)"
tells enough of the story to be practically useful. Once you
decide that isn't good enough, the burden on creating the
documentation is getting substantial, especially given that
you've already spent the effort to write the code and tests
for it. In fact I'd hesitate to raise the documentation bar
any higher -- I'd hate to think someone thought, "Oh, forget
it, I'll just use this code myself."
*Experimenting* is not necessarily as easy to
>do as you seem to believe. No, "experimenting" is not always easy (though often it is
easy enough). However, "experimenting" puts the cost on the
person who derives the benefit, and is thus likely to not be
done in a slipshod way.
I am not at all a lawyer type, as you seem to
imagine. I just want to suggest that some
(to the implementer - but not the average user)
complexity (especially for the most basic data types)
would be a good thing to have. Nothing more. My point is that simple performance characterization is not
good enough to be useful, and fully accurate characterization
is an onerous documentation burden. Something in between
will be fraught with complaints about "surprising" worst
cases. Whether you would approach middle-ground documentation
with the spirit of "this is enough to go on" or not, rest
assured that a number of people will complain in a "language-
lawyer-like" way about any perceived imperfections.
Would you mind if the quality is proportional to
the price you paid? Here I was too snippy. Sorry.
You are, of course, either assuming that there's a
single implementation of Python,
or that all implementations have the same behaviour. (Each line denied with "Of course not.")
But realistically, most users will learn the performance
characteristics once, and continue to use the trade-offs
that were characteristics of that version of Python going
forward. Now maybe you behave differently, but most
programmers I know (and I explicitly include myself) do
have that tendency. Lots of python code moves around
operating systems and implementations (and the number
of implementations is growing).
But it is reasonable, anyway, to ask for information
about a specific implementation (that one is *forced*
to use).

You are not _forced_ to use any implementation of Python.

What I wanted to say is ... this: *whenever* a Python
program is run, it will run on some specific
implementation of Python. That much should be obvious.

If I were still in a snippy mood, I'd point out that you
might look at your original statement and see how it might
rather than what wanted to write.
You are free to implement your own Python system.

Don't be ridiculous.

If Pypy gets going this may not be as unlikely as you think.
You will be able to (relatively easily) replace implementations
of parts of Python and re-generate a system.
It is you (and only you) here, who is arguing that if
perfection is not attainable (with reasonable effort),
nothing should be done at all. I am trying to say the request you make is substantially larger
than you understand (or would appear at first blush). Further
it is a request that asks for work from others for your (and
other's) benefit, not work that you offer to pitch in and help on.
I am as willing to live with imperfect module inferface
specification as I am willing with imperfect tools more
generally. But that will not stop me from making
*suggestions* for improvement (not arrogant demands, mind you). Try writing something that would fit your standards for all of
the basic types, punting where you don't know details. If it is
enough to be useful, others will chip in and help fix it where it
is broken.
I have seen *many* books on algorithms that specify
computational complexity measures for the algorithms
described. Any really good library of algorithms will
at least try to reach that standard, too.

How many "really good libraries of algorithms" do you know of?
Could you name a couple that have these characterizations?
Practical code is rife with issues of "finite address space,"
low-order effects swamping the higher-order terms for most
practical sizes and such. It is this collection of nasty little
details that makes characterizing libraries frustratingly hard.
I consider it the job of the implementer to ...

Am I to take it you provide this to all of your customers?

...I did not want to pose as the customer who expects nothing
but perfection from *others* (but not himself, of course).
I am not at all like that: you are attacking

Actually, I was not attacking at all. I was trying to suggest
that it is a harder task than you imagine. I am assuming you won't
take me up on the suggestion of starting a flawed document, but I'd
be happy to be proved wrong. If not, try writing a characterization
of the performance of a dozen or so of your own programs. Does the
result seem useful in proportion to the work it took you to write it?
How reasonable is it to ask me, or anyone else
for that matter, to extract, experiment-wise
(if it can be done at all with reasonable effort)
information that properly belongs to the implementer
and really should have been exposed in the
documentation in the first place?
Not at all reasonable. How reasonable is it to ask

OK, here I was _very_ snippy. Sorry again.

I will assert that often the experiment results in a single bit of
information: "it is fast enough." Experimenting will get you reliable
answers like that, and only when run-time is an issue will you need to
go into it much further.

--Scott David Daniels
sc***********@acm.org
Oct 14 '05 #13
"jon" <jo********@capisco.com> wrote in message

To take the heat out of the discussion:

sets are blazingly fast.

I'd prefer a (however) rough characterization
of computational complexity in terms of Big-Oh
(or Big-whatever) *anytime* to marketing-type
characterizations like this one...

Regards,
Christian
Oct 15 '05 #14
On Sat, 15 Oct 2005 06:31:53 +0200, Christian Stapfer wrote:
"jon" <jo********@capisco.com> wrote in message

To take the heat out of the discussion:

sets are blazingly fast.

I'd prefer a (however) rough characterization
of computational complexity in terms of Big-Oh
(or Big-whatever) *anytime* to marketing-type
characterizations like this one...

Oh how naive.

The marketing department says: "It's O(N), so it is blindingly fast."

Translation: the amount of computation it does is linearly proportional
to N. The constant of proportionality is 1e10.

The marketing department says: "Our competitor's product is O(N**2), so it
runs like a three-legged dog."

Translation: the amount of computation it does is linearly proportional to
N squared. The constant of proportionality is 1e-100.

You do the maths.

Big O notation is practically useless for judging how fast a single
algorithm will be, or how one algorithm compares to another. It is only
useful for telling you how a single algorithm will scale as the input
increases.

It is very common for sensible programmers to fall back on a "less
efficient" O(N**2) or even O(2**N) algorithm for small amounts of data, if
that algorithm runs faster than the "more efficient" O(N) or O(log N)
algorithm. In fact, that's exactly what the sort() method does in Python:
for small enough lists, say, under 100 elements, it is quicker to run an
O(N**2) algorithm (shell sort I believe) than it is to perform the
complex set up for the merge-sort variant used for larger lists.

As for sets, they are based on dicts, which are effectively hash tables.
Hash tables are O(1), unless there are collisions, in which case the more
common algorithms degenerate to O(N). So, a very rough and ready estimate
of the complexity of the algorithm using sets would be somewhere between
O(1) and O(N) depending on the details of Python's algorithms.

So, let's do an experiment, shall we?

from sets import Set
import time

def compare_and_separate_with_sets(A, B):
AB = Set(A+B)
A = Set(A)
B = Set(B)
only_A = list(A-B)
only_B = list(B-A)
both_AB = list(AB - Set(only_A+only_B))
return (only_A, only_B, both_AB)

def timeit(f, args, n):
"""Time function f when called with *args. For timing purposes,
does n calls of f. Returns the average time used per call in seconds.
"""
loopit = range(n)
t = time.time()
for i in loopit:
results = f(*args)
t = time.time() - t
return t/n

def single_test(N):
print ("N = %-8d" % N),
A = range(N)
B = range(N/2, 3*N/2)
return timeit(compare_and_separate_with_sets, (A, B), 20)

def test_Order():
# test how compare_and_separate_with_sets scales with size
for i in range(7):
print single_test(10**i)
Now run the test code:

py> test_Order()
N = 1 0.000219106674194
N = 10 0.000135183334351
N = 100 0.000481128692627
N = 1000 0.0173740386963
N = 10000 0.103679180145
N = 100000 0.655336141586
N = 1000000 8.12827801704

In my humble opinion, that's not bad behaviour. It looks O(log N) to me,
and quite fast too: about 8 seconds to compare and separate two lists of
one million items each.

The craziest thing is, the amount of time it took to write and test two
different algorithms was probably 1% of the time it would take to hunt up
theoretical discussions of what the big O behaviour of the algorithms
would be.

--
Steven.

Oct 15 '05 #15
"Steven D'Aprano" <st***@REMOVETHIScyber.com.au> wrote in message
news:pa****************************@REMOVETHIScybe r.com.au...
On Sat, 15 Oct 2005 06:31:53 +0200, Christian Stapfer wrote:
"jon" <jo********@capisco.com> wrote in message

To take the heat out of the discussion:

sets are blazingly fast.
I'd prefer a (however) rough characterization
of computational complexity in terms of Big-Oh
(or Big-whatever) *anytime* to marketing-type
characterizations like this one...

Oh how naive.

Why is it that even computer science undergrads
are required to learn the basics of Big-Oh and
all that? Are computer scientists really morons,
as Xah Lee suggests? I can't believe it, but
maybe you have a point...
The marketing department says: "It's O(N), so it is blindingly fast."
I might as well interpret "blindingly fast"
as meaning O(1). - Why not?
Surely marketing might also have reasoned like
this: "It's O(1), so its blindingly fast".
But I *want*, nay, I *must* know whether it is
O(N) or O(1). So forget about marketingspeak,
it's a deadly poison for developers. It might
be ok to induce grandiose feelings in childish
users - but developers are grown ups: they must
face reality...
Translation: the amount of computation it does is linearly proportional
to N. The constant of proportionality is 1e10.

The marketing department says: "Our competitor's product is O(N**2), so it
runs like a three-legged dog."

Translation: the amount of computation it does is linearly proportional to
N squared. The constant of proportionality is 1e-100.

You do the maths.

Big O notation is practically useless for judging how fast a single
algorithm will be, or how one algorithm compares to another.
That's why Knuth liked it so much?
That's why Aho, Hopcroft and Ullman liked it so much?
That's why Gonnet and Baeza-Yates liked it so much?
It is only useful for telling you how a single algorithm
will scale as the input increases.
And that's really very useful information indeed.
Since, given such information for the basic data types
and operations, as implemented by the language and
its standard libraries, I stand a real chance of
being able to determine the computational complexity
of the *particular*combination* of data types and
algorithms of my own small utility or of a
critical piece of my wonderful and large application,
on which the future of my company depends, with some
confidence and accuracy.
It is very common for sensible programmers to fall back on a "less
efficient" O(N**2) or even O(2**N) algorithm for small amounts of data, if
that algorithm runs faster than the "more efficient" O(N) or O(log N)
algorithm. In fact, that's exactly what the sort() method does in Python:
for small enough lists, say, under 100 elements, it is quicker to run an
O(N**2) algorithm (shell sort I believe) than it is to perform the
complex set up for the merge-sort variant used for larger lists.

As for sets, they are based on dicts, which are effectively hash tables.
Hash tables are O(1), unless there are collisions,
Depending on the "load factor" of the hash tables.
So we would want to ask, if we have very large
lists indeed, how much space needs to be invested
to keep the load factor so low that we can say
that the membership test is O(1). Do A-B and A&B
have to walk the entire hash table (which must be
larger than the sets, because of a load factor
< 1)? Also: the conversion of lists to sets needs
the insertion of N elements into those hash tables.
That alone already makes the overall algorithm
*at*least* O(N). So forget about O(log N).
in which case the more
common algorithms degenerate to O(N).
So here, indeed, we have the kind of reasoning that
one ought to be able to deliver, based on what's in
the Python documentation. Luckily, you have that
kind the knowledge of both, how sets are implemented
and what Big-Oh attaches to the hash table operation
of "look up".
In order to *enable* SUCH reasoning for *everyone*,
starting from the module interface documentation only,
one clearly needs something along the lines that
I was suggesting...
So, a very rough and ready estimate
of the complexity of the algorithm using sets would be somewhere between
O(1) and O(N) depending on the details of Python's algorithms.

So, let's do an experiment, shall we?

from sets import Set
import time

def compare_and_separate_with_sets(A, B):
AB = Set(A+B)
A = Set(A)
B = Set(B)
only_A = list(A-B)
only_B = list(B-A)
both_AB = list(AB - Set(only_A+only_B))
return (only_A, only_B, both_AB)

def timeit(f, args, n):
"""Time function f when called with *args. For timing purposes,
does n calls of f. Returns the average time used per call in seconds.
"""
loopit = range(n)
t = time.time()
for i in loopit:
results = f(*args)
t = time.time() - t
return t/n

def single_test(N):
print ("N = %-8d" % N),
A = range(N)
B = range(N/2, 3*N/2)
return timeit(compare_and_separate_with_sets, (A, B), 20)

def test_Order():
# test how compare_and_separate_with_sets scales with size
for i in range(7):
print single_test(10**i)
Now run the test code:

py> test_Order()
N = 1 0.000219106674194
N = 10 0.000135183334351
Curious: N=10 takes less time than N=1?
N = 100 0.000481128692627
Why do we have such a comparatively large jump
here, from N=100 to N=1000? Did hash tables
overflow during conversion or something like
that?
N = 1000 0.0173740386963
N = 10000 0.103679180145
N = 100000 0.655336141586
N = 1000000 8.12827801704
Doesn't look quite O(n). Not yet...
In my humble opinion, that's not bad behaviour.
It looks O(log N) to me,
How could that be? *Every* element of A and B must touched,
if only to be copied: that can't make it O(log(N)).
Also, the conversion of lists to sets must be at least
O(N). And N isn't the right measure anyway. It would probably
have to be in terms of |A| and |B|. For example, if |A| is very
small, as compared to |B|, then A-B and A & B can be determined
rather quickly by only considering elements of A.
and quite fast too: about 8 seconds to compare and separate two lists of
one million items each.

The craziest thing is, the amount of time it took to write and test two
different algorithms was probably 1% of the time it would take to hunt up
theoretical discussions of what the big O behaviour of the algorithms
would be.

You must distinguish questions of principle
and questions of muddling through like this
testing bit you've done. It would take me some
time to even be *sure* how to interpret the
result. I would never want to say "it looks
O(log N) to me", as you do, and leave it at
that. Rather, I might say, as you do, "it
looks O(log N) to me", *but* then try to figure
out, given my knowledge of the implementation
(performance wise, based on information that
is sadly missing in the Python documentation),
*why* that might be. Then, if my experiments says
"it looks like O(log N)" AND if my basic
knowledge of the implementation of set and
list primitives says "it should be O(log N)"
as well, I would venture, with some *confidence*,
to claim: "it actually IS O(log N)"....

You do not compare the convert-it-to-sets-approach
to the single list-walk either. Supposing the OP
had actually sorted lists to begin with, then a
single, simultaneous walk of the lists would be
about as fast as it can get. Very, very likely
*faster* than conversion to sets would be...

Regards,
Christian
Oct 15 '05 #16
On Sat, 15 Oct 2005 18:17:36 +0200, Christian Stapfer wrote:
I'd prefer a (however) rough characterization
of computational complexity in terms of Big-Oh
(or Big-whatever) *anytime* to marketing-type
characterizations like this one...
Oh how naive.

Why is it that even computer science undergrads
are required to learn the basics of Big-Oh and
all that?

So that they know how to correctly interpret what Big O notation means,
instead of misinterpreting it. Big O notation doesn't tell you everything
you need to know to predict the behaviour of an algorithm. It doesn't even
tell you most of what you need to know about its behaviour. Only actual
*measurement* will tell you what you need to know.

Perhaps you should actually sit down and look at the assumptions,
simplifications, short-cuts and trade-offs that computer scientists make
when they estimate an algorithm's Big O behaviour. It might shock you out
of your faith that Big O is the be all and end all of algorithm planning.

For all but the simplest algorithm, it is impractical to actually count
all the operations -- and even if you did, the knowledge wouldn't help
you, because you don't know how long each operation takes to get executed.
That is platform specific.

So you simplify. You pretend that paging never happens. That memory
allocations take zero time. That set up and removal of data structures
take insignificant time. That if there is an N**2 term, it always swamps
an N term. You assume that code performance is independent of the CPUs.
You assume that some operations (e.g. comparisons) take no time, and
others (e.g. moving data) are expensive.

Those assumptions sometimes are wildly wrong. I've been seriously bitten
following text book algorithms written for C and Pascal: they assume that
comparisons are cheap and swapping elements are expensive. But in Python,
swapping elements is cheap and comparisons are expensive, because of all
the heavy object-oriented machinery used. Your classic text book algorithm
is not guaranteed to survive contact with the real world: you have to try
it and see.

Given all the assumptions, it is a wonder that Big O estimates are ever
useful, not that they sometimes are misleading.

[snip]
The marketing department says: "It's O(N), so it is blindingly fast."

I might as well interpret "blindingly fast" as meaning O(1). - Why not?
Surely marketing might also have reasoned like
this: "It's O(1), so its blindingly fast". But I *want*, nay, I *must*
know whether it is O(N) or O(1).

You might _want_, but you don't _need_ to know which it is, not in every
case. In general, there are many circumstances where it makes no
sense to worry about Big O behaviour. What's your expected data look like?
If your data never gets above N=2, then who cares whether it is O(1)=1,
O(N)=2, O(N**2)=4 or O(2**N)=2? They are all about as fast.

Even bubble sort will sort a three element list fast enough -- and
probably faster than more complicated sorts. Why spend all the time
setting up the machinery for a merge sort for three elements?
[snip]
Big O notation is practically useless for judging how fast a single
algorithm will be, or how one algorithm compares to another.

That's why Knuth liked it so much?
That's why Aho, Hopcroft and Ullman liked it so much? That's why Gonnet
and Baeza-Yates liked it so much?

Two reasons: it is useful for telling you how a single algorithm will
scale as the input increases, just as I said.

And, unlike more accurate ways of calculating the speed of an algorithm
from first principles, it is actually possible to do Big O calculations.

No doubt the state of the art of algorithm measurements has advanced since
I was an undergraduate, but certain fundamental facts remain: in order to
calculate that Big O, you have to close your eyes to all the realities
of practical code execution, and only consider an idealised calculation.
Even when your calculation gives you constants of proportionality and
other coefficients, Big O notation demands you throw that information away.

But by doing so, you lose valuable information. An O(N**2) algorithm that
scales like 1e-6 * N**2 will be faster than an O(N) algorithm that scales
as 1e6 * N, until N reaches one million million. By tossing away those
coefficients, you wrongly expect the first algorithm to be slower than the
second, and choose the wrong algorithm.
It is only useful for telling you how a single algorithm will scale as
the input increases.

And that's really very useful information indeed.

Yes it is. Did I ever say it wasn't?

Since, given such
information for the basic data types and operations, as implemented by
the language and its standard libraries, I stand a real chance of being
able to determine the computational complexity of the
*particular*combination* of data types and algorithms of my own small
utility or of a critical piece of my wonderful and large application, on
which the future of my company depends, with some confidence and
accuracy.
Yes, zero is a real chance.
[snip]
As for sets, they are based on dicts, which are effectively hash
tables. Hash tables are O(1), unless there are collisions,

Depending on the "load factor" of the hash tables. So we would want to
ask, if we have very large lists indeed, how much space needs to be
invested to keep the load factor so low that we can say that the
membership test is O(1).

And knowing that hash tables are O(1) will not tell you that, will it?

There is only one practical way of telling: do the experiment. Keep
loading up that hash table until you start getting lots of collisions.

Do A-B and A&B have to walk the entire hash
table (which must be larger than the sets, because of a load factor <
1)? Also: the conversion of lists to sets needs the insertion of N
elements into those hash tables. That alone already makes the overall
algorithm *at*least* O(N). So forget about O(log N).
Yes, inserting N items into a hash table takes at least N inserts. But if
those inserts are fast enough, you won't even notice the time it takes to
do it, compared to the rest of your algorithm. In many algorithms, you
don't even care about the time it takes to put items in your hash table,
because that isn't part of the problem you are trying to solve.

So in real, practical sense, it may be that your algorithm gets dominated
by the O(log N) term even though there is technically an O(N) term in
there. Are Python dicts like that? I have no idea. But I know how to find
out: choose a problem domain I care about ("dicts with less than one
million items") and do the experiment.

in which case the more
common algorithms degenerate to O(N).

So here, indeed, we have the kind of reasoning that one ought to be able
to deliver, based on what's in the Python documentation. Luckily, you
have that kind the knowledge of both, how sets are implemented and what
Big-Oh attaches to the hash table operation of "look up".
In order to *enable* SUCH reasoning for *everyone*,
starting from the module interface documentation only, one clearly needs
something along the lines that I was suggesting...

I don't object to having that Big O information available, except
insofar as it can be misleading, but I take issue with your position that
such information is necessary.
[snip]
Now run the test code:

py> test_Order()
N = 1 0.000219106674194
N = 10 0.000135183334351

Curious: N=10 takes less time than N=1?

Yes, funny how real-world results aren't as clean and neat as they are in
theory. There are those awkward assumptions coming to bite you again. I've
done two additional tests, and get:

N = 1 0.000085043907166
N = 10 0.000106656551361

N = 1 0.000497949123383
N = 10 0.000124049186707

Remember, these results are averaged over twenty trials. So why it is
quicker to do work with sets of size 10 than sets of size 1? Big O
notation will never tell you, because it ignores the implementation
details that really make a difference.
N = 100 0.000481128692627

Why do we have such a comparatively large jump here, from N=100 to
N=1000? Did hash tables overflow during conversion or something like
that?

Who knows? Maybe Python was doing some garbage collection the first time I
run it. I've modified my code to print a scale factor, and here is another
run:

N = 1 0.00113509893417
N = 10 0.000106143951416 (x 0.093511)
N = 100 0.00265134572983 (x 24.978774)
N = 1000 0.0057701587677 (x 2.176313)
N = 10000 0.0551437973976 (x 9.556721)
N = 100000 0.668345856667 (x 12.120055)
N = 1000000 8.6285964489 (x 12.910376)

An increase from N=1 to 1000000 (that's a factor of one million) leads to
an increase in execution time of about 7600.

You will notice that the individual numbers vary significantly from trial
to trial, but the over-all pattern is surprisingly consistent.

N = 1000 0.0173740386963
N = 10000 0.103679180145
N = 100000 0.655336141586
N = 1000000 8.12827801704

Doesn't look quite O(n). Not yet...

No it doesn't.
In my humble opinion, that's not bad behaviour. It looks O(log N) to
me,

That's a mistake -- it is nowhere near O(log N). My bad. Closer to
O(sqrt N), if anything.

How could that be? *Every* element of A and B must touched, if only to
be copied: that can't make it O(log(N)).
And, no doubt, if you had *really enormous* lists, oh, I don't know, maybe
a trillion items, you would see that O(N) behaviour. But until then, the
overall performance is dominated by the smaller-order terms with larger
coefficients.

Also, the conversion of lists
to sets must be at least O(N). And N isn't the right measure anyway. It
would probably have to be in terms of |A| and |B|. For example, if |A|
is very small, as compared to |B|, then A-B and A & B can be determined
rather quickly by only considering elements of A.

Both lists have the same number of elements, so double N.
[snip]
You must distinguish questions of principle and questions of muddling
through like this testing bit you've done.
Remember, you were the one who predicted that lists would have to be
faster than sets. Your prediction failed miserably.

It would take me some time to
even be *sure* how to interpret the result.
What's to interpret? I know exactly how fast the function will run, on
average, on my hardware. I can even extrapolate to larger sizes of N,
although I would be very careful to not extrapolate too far. (I predict
less than 10 minutes to work on a pair of 10,000,000 element lists, and
less than two hours to work on 100,000,000 element lists.)
I would never want to say
"it looks O(log N) to me", as you do, and leave it at that. Rather, I
might say, as you do, "it looks O(log N) to me", *but* then try to
figure out, given my knowledge of the implementation (performance wise,
based on information that is sadly missing in the Python documentation),
*why* that might be.
Fine. You have the source code, knock yourself out.
Then, if my experiments says "it looks like O(log
N)" AND if my basic knowledge of the implementation of set and list
primitives says "it should be O(log N)" as well, I would venture, with
some *confidence*, to claim: "it actually IS O(log N)"....

You do not compare the convert-it-to-sets-approach
to the single list-walk either.
No I did not, because I didn't have a function to do it. You've got my
source code. Knock yourself out to use it to test any function you like.
Supposing the OP had actually sorted
lists to begin with, then a single, simultaneous walk of the lists would
be about as fast as it can get. Very, very likely *faster* than
conversion to sets would be...

Please let us know how you go with that. It should be really interesting
to see how well your prediction copes with the real world.

(Hint: another of those awkward little implementation details... how much
work is being done in C code, and how much in pure Python? Just something
for you to think about. And remember, an O(N) algorithm in Python will be
faster than an O(N**2) algorithm in C... or is that slower?)

--
Steven.

Oct 15 '05 #17
"Steven D'Aprano" <st***@REMOVETHIScyber.com.au> wrote in message
news:pa****************************@REMOVETHIScybe r.com.au...
On Sat, 15 Oct 2005 18:17:36 +0200, Christian Stapfer wrote:
I'd prefer a (however) rough characterization
of computational complexity in terms of Big-Oh
(or Big-whatever) *anytime* to marketing-type
characterizations like this one...

Oh how naive.
Why is it that even computer science undergrads
are required to learn the basics of Big-Oh and
all that?

So that they know how to correctly interpret what Big O notation means,
instead of misinterpreting it. Big O notation doesn't tell you everything
you need to know to predict the behaviour of an algorithm.

Well, that's right. I couldn't agree more:
it doesn't tell you *everything* but it does
tell you *something*. And that *something*
is good to have.
It doesn't even tell you most of what you need to know about its
behaviour.
Only actual *measurement* will tell you what you need to know.
Well, that's where you err: Testing doesn't
tell you everything *either*. You need *both*:
a reasonable theory *and* experimental data...
If theory and experimental data disagree,
we would want to take a closer look on both,
the theory (which may be mistaken or inadequate)
*and* the experiment (which may be inadequate or
just plain botched as well).
Perhaps you should actually sit down and look at the assumptions,
simplifications, short-cuts and trade-offs that computer scientists make
when they estimate an algorithm's Big O behaviour. It might shock you out
of your faith that Big O is the be all and end all of algorithm planning.

For all but the simplest algorithm, it is impractical to actually count
all the operations -- and even if you did, the knowledge wouldn't help
you, because you don't know how long each operation takes to get executed.
That is platform specific.

So you simplify. You pretend that paging never happens. That memory
allocations take zero time. That set up and removal of data structures
take insignificant time. That if there is an N**2 term, it always swamps
an N term. You assume that code performance is independent of the CPUs.
You assume that some operations (e.g. comparisons) take no time, and
others (e.g. moving data) are expensive.

Those assumptions sometimes are wildly wrong. I've been seriously bitten
following text book algorithms written for C and Pascal: they assume that
comparisons are cheap and swapping elements are expensive. But in Python,
swapping elements is cheap and comparisons are expensive, because of all
the heavy object-oriented machinery used. Your classic text book algorithm
is not guaranteed to survive contact with the real world: you have to try
it and see.
Still: the expensiveness of those operations (such
as swapping elements vs. comparisons) will only
affect the constant of proportionality, not the
asymptotic behavior of the algorithm. Sooner or
later the part of your program that has the
worst asymptotic behavior will determine speed
(or memory requirements) of your program.
Given all the assumptions, it is a wonder that Big O
estimates are ever useful, not that they sometimes

[snip]
The marketing department says: "It's O(N), so it is blindingly fast."
I might as well interpret "blindingly fast" as meaning O(1). - Why not?
Surely marketing might also have reasoned like
this: "It's O(1), so its blindingly fast". But I *want*, nay, I *must*
know whether it is O(N) or O(1).

You might _want_, but you don't _need_ to know which it is, not in every
case. In general, there are many circumstances where it makes no
sense to worry about Big O behaviour. What's your expected data look like?
If your data never gets above N=2, then who cares whether it is O(1)=1,
O(N)=2, O(N**2)=4 or O(2**N)=2? They are all about as fast.

Even bubble sort will sort a three element list fast enough -- and
probably faster than more complicated sorts. Why spend all the time
setting up the machinery for a merge sort for three elements?

Since you have relapsed into a fit of mere polemics,
I assume to have made my point as regards marketing
type characterizations of algorithms ("blazingly
fast") vs. measures, however rough, of asymptotic
complexity measures, like Big-Oh. - Which really
was the root of this sub-thread that went like this:

...>> To take the heat out of the discussion:
... >> sets are blazingly fast.

... > I'd prefer a (however) rough characterization
... > of computational complexity in terms of Big-Oh
... > (or Big-whatever) *anytime* to marketing-type
... > characterizations like this one...
[snip]
Big O notation is practically useless for judging how fast a single ^^^^^^^^^^^^^^^^^^^^^^ algorithm will be, or how one algorithm compares to another.
That's why Knuth liked it so much?
That's why Aho, Hopcroft and Ullman liked it so much? That's why Gonnet
and Baeza-Yates liked it so much?

Two reasons: it is useful for telling you how a single algorithm will
scale as the input increases, just as I said.

Curiously, just a few lines before writing this,
you have polemically denied any "practical" use
for Big-Oh notation.
And, unlike more accurate ways of calculating the speed of an algorithm
from first principles, it is actually possible to do Big O calculations.
Right. It's a compromise: being somewhat precise
- without getting bogged down trying to solve major
combinatorial research problems...
No doubt the state of the art of algorithm measurements has advanced since
I was an undergraduate, but certain fundamental facts remain: in order to
calculate that Big O, you have to close your eyes to all the realities
of practical code execution, and only consider an idealised calculation.
That's right. Nothing stops you from then opening
your eyes and testing some code, of course. *But*
always try to relate what you see there with what
theoretical grasp of the situation you have.
If experimental data and theory *disagree*: try to
fix the experiment and/or the theory.
Even when your calculation gives you constants of proportionality and
other coefficients, Big O notation demands you throw that information
away.

But by doing so, you lose valuable information. An O(N**2) algorithm that
scales like 1e-6 * N**2 will be faster than an O(N) algorithm that scales
as 1e6 * N, until N reaches one million million. By tossing away those
coefficients, you wrongly expect the first algorithm to be slower than the
second, and choose the wrong algorithm.
It is only useful for telling you how a single algorithm will scale as
the input increases.
And that's really very useful information indeed.

Yes it is. Did I ever say it wasn't?

Well yes, by the way you attacked Big-Oh notation
as "practically useless" (see above) I assumed you
did.
Since, given such
information for the basic data types and operations, as implemented by
the language and its standard libraries, I stand a real chance of being
able to determine the computational complexity of the
*particular*combination* of data types and algorithms of my own small
utility or of a critical piece of my wonderful and large application, on
which the future of my company depends, with some confidence and
accuracy.

Yes, zero is a real chance.
[snip]
As for sets, they are based on dicts, which are effectively hash
tables. Hash tables are O(1), unless there are collisions,

Depending on the "load factor" of the hash tables. So we would want to
ask, if we have very large lists indeed, how much space needs to be
invested to keep the load factor so low that we can say that the
membership test is O(1).

And knowing that hash tables are O(1) will not tell you that, will it?

There is only one practical way of telling: do the experiment. Keep
loading up that hash table until you start getting lots of collisions.
Do A-B and A&B have to walk the entire hash
table (which must be larger than the sets, because of a load factor <
1)? Also: the conversion of lists to sets needs the insertion of N
elements into those hash tables. That alone already makes the overall
algorithm *at*least* O(N). So forget about O(log N).

Yes, inserting N items into a hash table takes at least N inserts. But if
those inserts are fast enough, you won't even notice the time it takes to
do it, compared to the rest of your algorithm. In many algorithms, you
don't even care about the time it takes to put items in your hash table,
because that isn't part of the problem you are trying to solve.

So in real, practical sense, it may be that your algorithm gets dominated
by the O(log N) term even though there is technically an O(N) term in
there. Are Python dicts like that? I have no idea. But I know how to find
out: choose a problem domain I care about ("dicts with less than one
million items") and do the experiment.

in which case the more
common algorithms degenerate to O(N).

So here, indeed, we have the kind of reasoning that one ought to be able
to deliver, based on what's in the Python documentation. Luckily, you
have that kind the knowledge of both, how sets are implemented and what
Big-Oh attaches to the hash table operation of "look up".
In order to *enable* SUCH reasoning for *everyone*,
starting from the module interface documentation only, one clearly needs
something along the lines that I was suggesting...

I don't object to having that Big O information available, except
insofar as it can be misleading, but I take issue with your position that
such information is necessary.

*Blindly* testing, that is, testing *without* being
able to *relate* the outcomes of those tests (even
the *design* of those tests) to some suitably
simplified but not at all completely nonsensical
theory (via Big-Oh notation, for example), is *not*
really good enough.
Now run the test code:

py> test_Order()
N = 1 0.000219106674194
N = 10 0.000135183334351

Curious: N=10 takes less time than N=1?

Yes, funny how real-world results aren't as clean and neat as they are in
theory. There are those awkward assumptions coming to bite you again. I've
done two additional tests, and get:

N = 1 0.000085043907166
N = 10 0.000106656551361

N = 1 0.000497949123383
N = 10 0.000124049186707

Remember, these results are averaged over twenty trials. So why it is
quicker to do work with sets of size 10 than sets of size 1? Big O
notation will never tell you, because it ignores the implementation
details that really make a difference.
N = 100 0.000481128692627

Why do we have such a comparatively large jump here, from N=100 to
N=1000? Did hash tables overflow during conversion or something like
that?

Who knows? Maybe Python was doing some garbage collection the first time I
run it. I've modified my code to print a scale factor, and here is another
run:

N = 1 0.00113509893417
N = 10 0.000106143951416 (x 0.093511)
N = 100 0.00265134572983 (x 24.978774)
N = 1000 0.0057701587677 (x 2.176313)
N = 10000 0.0551437973976 (x 9.556721)
N = 100000 0.668345856667 (x 12.120055)
N = 1000000 8.6285964489 (x 12.910376)

An increase from N=1 to 1000000 (that's a factor of one million) leads to
an increase in execution time of about 7600.

You will notice that the individual numbers vary significantly from trial
to trial, but the over-all pattern is surprisingly consistent.

N = 1000 0.0173740386963
N = 10000 0.103679180145
N = 100000 0.655336141586
N = 1000000 8.12827801704

Doesn't look quite O(n). Not yet...

No it doesn't.
In my humble opinion, that's not bad behaviour. It looks O(log N) to
me,
That's a mistake -- it is nowhere near O(log N). My bad. Closer to
O(sqrt N), if anything.

How could that be? *Every* element of A and B must touched, if only to
be copied: that can't make it O(log(N)).

And, no doubt, if you had *really enormous* lists, oh, I don't know, maybe
a trillion items, you would see that O(N) behaviour. But until then, the
overall performance is dominated by the smaller-order terms with larger
coefficients.

Also, the conversion of lists
to sets must be at least O(N). And N isn't the right measure anyway. It
would probably have to be in terms of |A| and |B|. For example, if |A|
is very small, as compared to |B|, then A-B and A & B can be determined
rather quickly by only considering elements of A.

Both lists have the same number of elements, so double N.
[snip]
You must distinguish questions of principle and questions of muddling
through like this testing bit you've done.

Remember, you were the one who predicted that lists would have to be
faster than sets.

I didn't say they would *have* to be faster
- I was mainly asking for some *reasoned*
argument why (and in what sense) conversion
to sets would be an "efficient" solution
of the OPs problem.
Interestingly, you admit that you did not
really compare the two approaches that
were under discussion. So your experiment
does *not* (yet) prove what you claim it
proves.
It would take me some time to
even be *sure* how to interpret the result.

What's to interpret? I know exactly how fast the function will run, on
average, on my hardware. I can even extrapolate to larger sizes of N,
although I would be very careful to not extrapolate too far. (I predict
less than 10 minutes to work on a pair of 10,000,000 element lists, and
less than two hours to work on 100,000,000 element lists.)

For a starter: You have chosen a very particular type
of element of those lists / sets: integers. So the
complexity of comparisons for the OPs application
might get *seriously* underestimated.
I would never want to say
"it looks O(log N) to me", as you do, and leave it at that. Rather, I
might say, as you do, "it looks O(log N) to me", *but* then try to
figure out, given my knowledge of the implementation (performance wise,
based on information that is sadly missing in the Python documentation),
*why* that might be.
Fine. You have the source code, knock yourself out.

That's just what I do *not* think to be a particularly
reasonable approach. Instead, I propose stating
some (to the implementer *easily* available)
information about asymptotic behavior of operations
that are exposed by the module interface upfront.
Then, if my experiments says "it looks like O(log
N)" AND if my basic knowledge of the implementation of set and list
primitives says "it should be O(log N)" as well, I would venture, with
some *confidence*, to claim: "it actually IS O(log N)"....

You do not compare the convert-it-to-sets-approach
to the single list-walk either.

No I did not, because I didn't have a function to do it.

Here we see one of the problems of a purely
experimentalist approach to computational complexity:
you need an implementation (of the algorithm and the
test harness) *before* you can get your wonderfully
decisive experimental data.
This is why we would like to have a way of (roughly)
estimating the reasonableness of the outlines of a
having to write any code and/or test harness.
You've got my
source code. Knock yourself out to use it to test any function you like.
Supposing the OP had actually sorted
lists to begin with, then a single, simultaneous walk of the lists would
be about as fast as it can get. Very, very likely *faster* than
conversion to sets would be...

Please let us know how you go with that. It should be really interesting
to see how well your prediction copes with the real world.

(Hint: another of those awkward little implementation details... how much
work is being done in C code, and how much in pure Python? Just something
for you to think about. And remember, an O(N) algorithm in Python will be
faster than an O(N**2) algorithm in C... or is that slower?)

This discussion begins to sound like the recurring
arguments one hears between theoretical and
experimental physicists. Experimentalists tend
to overrate the importance of experimental data
(setting up a useful experiment, how to interpret
the experimental data one then gathers, and whether
one stands any chance of detecting systematic errors
of measurement, all depend on having a good *theory*
in the first place). Theoreticians, on the other hand,
tend to overrate the importance of the coherence of
theories. In truth, *both* are needed: good theories
*and* carefully collected experimental data.

Regards,
Christian
--
»When asked how he would have reacted if Eddington's
*measurements* had come out differently, Einstein
replied: "Then I would have been sorry for him
- the *theory* is correct."«
- Paul B. Armstrong: 'Conflicting Readings'
Oct 16 '05 #18
Christian Stapfer wrote:
This discussion begins to sound like the recurring
arguments one hears between theoretical and
experimental physicists. Experimentalists tend
to overrate the importance of experimental data
(setting up a useful experiment, how to interpret
the experimental data one then gathers, and whether
one stands any chance of detecting systematic errors
of measurement, all depend on having a good *theory*
in the first place). Theoreticians, on the other hand,
tend to overrate the importance of the coherence of
theories. In truth, *both* are needed: good theories
*and* carefully collected experimental data.

Regards,
Christian

An interesting parallel can be made concerning management of production
vs management of creativity.

In general, production needs checks and feedback to insure quality, but
will often come to a stand still if incomplete resources are available.

Where as creativity needs checks to insure production, but in many cases
can still be productive even with incomplete or questionable resources.
The quality may very quite a bit in both directions, but in creative
tasks, that is to be expected.

In many ways programmers are a mixture of these two. I think I and
Steven use a style that is closer to the creative approach. I get the
feeling your background may be closer to the production style.

Both are good and needed for different types of tasks. And I think most
programmers can switch styles to some degree if they need to.

Cheers,
Ron

Oct 16 '05 #19
Christian Stapfer wrote:
This discussion begins to sound like the recurring
arguments one hears between theoretical and
experimental physicists. Experimentalists tend
to overrate the importance of experimental data
(setting up a useful experiment, how to interpret
the experimental data one then gathers, and whether
one stands any chance of detecting systematic errors
of measurement, all depend on having a good *theory*
in the first place). Theoreticians, on the other hand,
tend to overrate the importance of the coherence of
theories. In truth, *both* are needed: good theories
*and* carefully collected experimental data.

Regards,
Christian

An interesting parallel can be made concerning management of production vs
management of creativity.

In general, production needs checks and feedback to insure quality, but
will often come to a stand still if incomplete resources are available.

Where as creativity needs checks to insure production, but in many cases
can still be productive even with incomplete or questionable resources.
The quality may very quite a bit in both directions, but in creative
tasks, that is to be expected.

In many ways programmers are a mixture of these two. I think I and Steven
use a style that is closer to the creative approach. I get the feeling
your background may be closer to the production style.

This diagnosis reminds me of C.G. Jung, the psychologist,
who, after having introduced the concepts of extra- and
introversion, came to the conclusion that Freud was
an extravert whereas Adler an introvert. The point is
that he got it exactly wrong...

As to the value of complexity theory for creativity
in programming (even though you seem to believe that
a theoretical bent of mind can only serve to stifle
creativity), the story of the discovery of an efficient
string searching algorithm by D.E.Knuth provides an
interesting case in point. Knuth based himself on
seemingly quite "uncreatively theoretical work" (from
*your* point of view) that gave a *better* value for
the computuational complexity of string searching
than any of the then known algorithms could provide.

Regards,
Christian
--
»It is no paradox to say that in our most theoretical
moods we may be nearest to our most practical applications.«

[and those "practical applications" will likely be most
"creative" ones..]
Oct 16 '05 #20
Christian Stapfer wrote:
This discussion begins to sound like the recurring
arguments one hears between theoretical and
experimental physicists. Experimentalists tend
to overrate the importance of experimental data
(setting up a useful experiment, how to interpret
the experimental data one then gathers, and whether
one stands any chance of detecting systematic errors
of measurement, all depend on having a good *theory*
in the first place). Theoreticians, on the other hand,
tend to overrate the importance of the coherence of
theories. In truth, *both* are needed: good theories
*and* carefully collected experimental data.

Regards,
Christian

An interesting parallel can be made concerning management of production vs
management of creativity.

In general, production needs checks and feedback to insure quality, but
will often come to a stand still if incomplete resources are available.

Where as creativity needs checks to insure production, but in many cases
can still be productive even with incomplete or questionable resources.
The quality may very quite a bit in both directions, but in creative
tasks, that is to be expected.

In many ways programmers are a mixture of these two. I think I and Steven
use a style that is closer to the creative approach. I get the feeling
your background may be closer to the production style.

Both are good and needed for different types of tasks. And I think most
programmers can switch styles to some degree if they need to.

Come to think of an experience that I shared
with a student who was one of those highly
creative experimentalists you seem to have
in mind. He had just bought a new PC and
wanted to check how fast its floating point
unit was as compared to our VAX. After
having done his wonderfully creative
experimenting, he was utterly dejected: "Our (old)
VAX is over 10'000 times faster than my new PC",
he told me, almost in despair. Whereupon I,
always the uncreative, dogmatic theoretician,
who does not believe that much in the decisiveness
of the outcome of mere experiments, told him
that this was *impossible*, that he *must* have

It turned out that the VAX compiler had been
clever enough to hoist his simple-minded test
code out of the driving loop. In fact, our VAX
calculated the body of the loop only *once*
and thus *immediately* announced that it had finished
the whole test - the compiler on this student's
PC, on the other hand, had not been clever enough
for this type of optimization: hence the difference...

I think this is really a cautionary tale for
experimentalists: don't *believe* in the decisiveness
of the outcomes your experiments, but try to *understand*
of the situation)...

Regards,
Christian
Oct 16 '05 #21
Christian Stapfer wrote:
As to the value of complexity theory for creativity
in programming (even though you seem to believe that
a theoretical bent of mind can only serve to stifle
creativity), the story of the discovery of an efficient
string searching algorithm by D.E.Knuth provides an
interesting case in point. Knuth based himself on
seemingly quite "uncreatively theoretical work" (from
*your* point of view) that gave a *better* value for
the computuational complexity of string searching
than any of the then known algorithms could provide.

are you talking about KMP? I'm not sure that's really a good example of
how useful "theoretical work" really is in practice:

- Morris had already implemented the algorithm (in 1968) when Knuth "dis-
covered" it (1971 or later), so the "then known" part of your argument is
obviously bogus. "then known by theoretical computer scientists" might
be correct, though.

- (iirc, Knuth's first version wasn't practical to use; this was fixed by Pratt)

- (and iirc, Boyer-Moore had already been invented when Knuth published the
first paper on KMP (in 1977))

- for use cases where the setup overhead is irrelevant, Boyer-Moore is almost
always faster than KMP. for many such cases, BM is a lot faster.

- for use cases such as Python's "find" method where the setup overhead cannot
be ignored, a brute-force search is almost always faster than KMP.

- for use cases such as Python's "find" method, a hybrid approach is almost
always faster than a brute-force search.

in other words, the "better" computational complexity of KMP has turned out
to be mostly useless, in practice.

</F>

Oct 16 '05 #22
On Sun, 16 Oct 2005 15:16:39 +0200, Christian Stapfer wrote:
Come to think of an experience that I shared
with a student who was one of those highly
creative experimentalists you seem to have
in mind. He had just bought a new PC and
wanted to check how fast its floating point
unit was as compared to our VAX. After
having done his wonderfully creative
experimenting, he was utterly dejected: "Our (old)
VAX is over 10'000 times faster than my new PC",
he told me, almost in despair.
Which it was. It finished executing his code in almost 1/10,000th of the
time his PC could do.
Whereupon I,
always the uncreative, dogmatic theoretician,
who does not believe that much in the decisiveness
of the outcome of mere experiments, told him
that this was *impossible*, that he *must* have
It wasn't a mistake and it did happen. The VAX finished the calculation
10,000 times faster than his PC. You have a strange concept of "impossible".

It turned out that the VAX compiler had been
clever enough to hoist his simple-minded test
code out of the driving loop.
Optimizations have a tendency to make a complete mess of Big O
calculations, usually for the better. How does this support your
theory that Big O is a reliable predictor of program speed?

For the record, the VAX 9000 can have up to four vector processors each
running at up to 125 MFLOPS each, or 500 in total. A Pentium III runs at
about 850 Mflops. Comparing MIPS or FLOPS from one system to another is
very risky, for many reasons, but as a very rough and ready measure
of comparison, a four processor VAX 9000 is somewhere about the
performance of a P-II or P-III, give or take some fudge factor.

So, depending on when your student did this experiment, it is entirely
conceivable that the VAX might have been faster even without the
optimization you describe. Of course, you haven't told us what model VAX,
or how many processors, or what PC your student had, so this comparison
might not be relevant.
In fact, our VAX
calculated the body of the loop only *once*
and thus *immediately* announced that it had finished
the whole test - the compiler on this student's
PC, on the other hand, had not been clever enough
for this type of optimization: hence the difference...
Precisely. And all the Big O notation is the world will not tell you that.
Only an experiment will. Now, perhaps in the simple case of a bare loop
doing the same calculation over and over again, you might be able to
predict ahead of time what optimisations the compiler will do. But for
more complex algorithms, forget it.

This is a clear case of experimentation leading to the discovery
of practical results which could not be predicted from Big O calculations.
I find it quite mind-boggling that you would use as if it was a triumph
of abstract theoretical calculation when it was nothing of the sort.

I think this is really a cautionary tale for
experimentalists: don't *believe* in the decisiveness
of the outcomes your experiments, but try to *understand*
of the situation)...

Or, to put it another way: your student discovered something by running an
experimental test of his code that he would never have learnt in a million
years of analysis of his algorithm: the VAX compiler was very cleverly
optimized.

The fact that your student didn't understand the problem well enough to
craft a good test of it is neither here nor there.

--
Steven.

Oct 16 '05 #23
Christian Stapfer wrote:
Christian Stapfer wrote:

This discussion begins to sound like the recurring
arguments one hears between theoretical and
experimental physicists. Experimentalists tend
to overrate the importance of experimental data
(setting up a useful experiment, how to interpret
the experimental data one then gathers, and whether
one stands any chance of detecting systematic errors
of measurement, all depend on having a good *theory*
in the first place). Theoreticians, on the other hand,
tend to overrate the importance of the coherence of
theories. In truth, *both* are needed: good theories
*and* carefully collected experimental data.

Regards,
Christian
An interesting parallel can be made concerning management of production vs
management of creativity.

In general, production needs checks and feedback to insure quality, but
will often come to a stand still if incomplete resources are available.

Where as creativity needs checks to insure production, but in many cases
can still be productive even with incomplete or questionable resources.
The quality may very quite a bit in both directions, but in creative
tasks, that is to be expected.

In many ways programmers are a mixture of these two. I think I and Steven
use a style that is closer to the creative approach. I get the feeling
your background may be closer to the production style.

This diagnosis reminds me of C.G. Jung, the psychologist,
who, after having introduced the concepts of extra- and
introversion, came to the conclusion that Freud was
an extravert whereas Adler an introvert. The point is
that he got it exactly wrong...

As to the value of complexity theory for creativity
in programming (even though you seem to believe that
a theoretical bent of mind can only serve to stifle
creativity), the story of the discovery of an efficient
string searching algorithm by D.E.Knuth provides an
interesting case in point. Knuth based himself on
seemingly quite "uncreatively theoretical work" (from
*your* point of view) that gave a *better* value for
the computuational complexity of string searching
than any of the then known algorithms could provide.

Regards,
Christian

(even though you seem to believe that
a theoretical bent of mind can only serve to stifle
creativity)

No, that is not at all what I believe. What I believe is, "The
insistence of strict conditions can limit creative outcomes."

The lack of those limits does not prevent one from using any resources
(including theoretical ones) if they are available.

You seem to be rejecting experimental results in your views. And the
level of insistence you keep in that view, leads me to believe you favor
a more productive environment rather than a more creative one. Both are
good, and I may entirely wrong about you, as many people are capable of
wearing different hats depending on the situation.

I think the gist of this thread may come down to...

In cases where it is not clear on what direction to go because the
choices are similar enough to make the choosing difficult. It is almost
always better to just pick one and see what happens than to do nothing.

Cheers,
Ron

Oct 16 '05 #24
"Fredrik Lundh" <fr*****@pythonware.com> wrote in message
news:ma*************************************@pytho n.org...
Christian Stapfer wrote:
As to the value of complexity theory for creativity
in programming (even though you seem to believe that
a theoretical bent of mind can only serve to stifle
creativity), the story of the discovery of an efficient
string searching algorithm by D.E.Knuth provides an
interesting case in point. Knuth based himself on
seemingly quite "uncreatively theoretical work" (from
*your* point of view) that gave a *better* value for
the computuational complexity of string searching
than any of the then known algorithms could provide.

Yes. I cannot give you the source of the story,
unfortunately, because I only have the *memory* of
it but don't know exactly *where* I happended to read
it. There, Knuth was said to have first analyzed the
theoretical argument very, very carefully to figure
out *why* it was that the theoretical bound was so
much better than all "practically known" algorithms.
It was by studing the theoretical work on computational
complexity *only* that the light dawned upon him.
(But of course, Knuth is "an uncreative dumbo fit
only for production work" - I am speaking ironically
here, which should be obvious.)
I'm not sure that's really a good example of
how useful "theoretical work" really is in practice:
Oh sure, yes, yes, it is. But my problem is to find
a good source of the original story. Maybe one
the "better" computational complexity of KMP has
turned out to be mostly useless, in practice.

Well, that's how things might turn out in the long run.
Still, at the time, to all appearances, it *was* a
case of practical creativity *triggered* by apparently
purely theoretical work in complexity theory.

More interesting than your trying to shoot down
one special case of the more general phenomenon of
theory engendering creativity would be to know
your position on the more general question...

It happens *often* in physics, you known. Einstein
is only one example of many. Pauli's prediction of
the existence of the neutrino is another. It took
experimentalists a great deal of time and patience
(about 20 years, I am told) until they could finally
muster something amounting to "experimental proof"
of Pauli's conjecture.

Regards,
Christian
--
"Experience without theory is blind,
but theory without experience is mere
intellectual play."
- Immanuel Kant

»Experience remains, of course, the sole criterion
of the *utility* of a mathematical construction.
But *the*creative*principle* resides in mathematics.«
- Albert Einstein: The World As I See It

»The astronomer Walter Baade told me that, when he
was dining with Pauli one day, Pauli exclaimed,
"Today I have done the worst thing for a theoretical
physicist. I have invented something which can never
be detected experimentally." Baade immediately offered
to bet a crate of champagne that the elusive neutrino
would one day prove amenable to experimental discovery.
Pauli accepted, unwisely failing to specify any time
limit, which made it impossible for him ever to win
the bet. Baade collected his crate of champagne (as
I can testify, having helped Baade consume a bottle of it)
when, just over twenty years later, in 1953, Cowan and
Reines did indeed succeed in detecting Paulis particle.«
- Fred Hoyle: Astronomy and Cosmology
Oct 16 '05 #25
Christian Stapfer wrote:
It turned out that the VAX compiler had been
clever enough to hoist his simple-minded test
code out of the driving loop. In fact, our VAX
calculated the body of the loop only *once*
and thus *immediately* announced that it had finished
the whole test - the compiler on this student's
PC, on the other hand, had not been clever enough
for this type of optimization: hence the difference...

I think this is really a cautionary tale for
experimentalists: don't *believe* in the decisiveness
of the outcomes your experiments, but try to *understand*
of the situation)...

Regards,
Christian

True understanding is of course the ideal, but as complexity increases
even theoretical information on a complex system becomes incomplete as
there are often other influences that will effect the outcome.

So the you could say: don't *depend* on the completeness of your
theoretical information, try to *verify* the validity of your results
with experiments.

Cheers,
Ron

Oct 16 '05 #26
Christian Stapfer wrote:
Christian Stapfer wrote:
This discussion begins to sound like the recurring
arguments one hears between theoretical and
experimental physicists. Experimentalists tend
to overrate the importance of experimental data
(setting up a useful experiment, how to interpret
the experimental data one then gathers, and whether
one stands any chance of detecting systematic errors
of measurement, all depend on having a good *theory*
in the first place). Theoreticians, on the other hand,
tend to overrate the importance of the coherence of
theories. In truth, *both* are needed: good theories
*and* carefully collected experimental data.

Regards,
Christian

An interesting parallel can be made concerning management of production
vs
management of creativity.

In general, production needs checks and feedback to insure quality, but
will often come to a stand still if incomplete resources are available.

Where as creativity needs checks to insure production, but in many cases
can still be productive even with incomplete or questionable resources.
The quality may very quite a bit in both directions, but in creative
tasks, that is to be expected.

In many ways programmers are a mixture of these two. I think I and
Steven
use a style that is closer to the creative approach. I get the feeling
your background may be closer to the production style.

This diagnosis reminds me of C.G. Jung, the psychologist,
who, after having introduced the concepts of extra- and
introversion, came to the conclusion that Freud was
an extravert whereas Adler an introvert. The point is
that he got it exactly wrong...

As to the value of complexity theory for creativity
in programming (even though you seem to believe that
a theoretical bent of mind can only serve to stifle
creativity), the story of the discovery of an efficient
string searching algorithm by D.E.Knuth provides an
interesting case in point. Knuth based himself on
seemingly quite "uncreatively theoretical work" (from
*your* point of view) that gave a *better* value for
the computational complexity of string searching
than any of the then known algorithms could provide.

Regards,
Christian

(even though you seem to believe that
a theoretical bent of mind can only serve to stifle
creativity)

No, that is not at all what I believe. What I believe is, "The insistence
of strict conditions can limit creative outcomes."

That's agreed. But going off *blindly*experimenting*
without trying to relate the outcome of that experimenting
back to ones theoretical grasp of the work one is doing
is *not* a good idea. Certainly not in the long run.
In fact, muddling-trough and avoiding the question
of suitable theoretical support for one's work is
perhaps more typical of production environments.
The lack of those limits does not prevent one from using any resources
(including theoretical ones) if they are available.

You seem to be rejecting experimental results in your views.
my posts in this thread and are simply projecting wildly,
as psychoanalysts would call it, that is all.
And the level of insistence you keep in that view,
A view that I do not really have: you are really projecting
indeed.
leads me to believe you favor a more productive environment
rather than a more creative one.
You are mistaken. Although I have some "practical background"
(originally working as a "self-taught" programmer - although,
ironically, for a "development and research department"),
I went on to study mathematics at the Federal Institute
of Technology here in Switzerland. Do you want to say that
having been trained as a mathematician makes one uncreative?
- But it is true that mathematicians are socialized in such
a way that they tend to take over rather high standards of
precision and theoretical grounding of their work.
Both are good, and I may entirely wrong about you,
... you are at least *somewhat* wrong about me,
that I am quite sure of...
as many people are capable of wearing different hats depending on the
situation.

I think the gist of this thread may come down to...

In cases where it is not clear on what direction to go because the choices
are similar enough to make the choosing difficult. It is almost always
better to just pick one and see what happens than to do nothing.

As it appears, not even my most recent post has had
*any* recognizable effect on your thoroughly
misapprehending my position.

Regards,
Christian
Oct 16 '05 #27
On Sun, 16 Oct 2005 19:42:11 +0200, Christian Stapfer wrote:
Pauli's prediction of
the existence of the neutrino is another. It took
experimentalists a great deal of time and patience
(about 20 years, I am told) until they could finally
muster something amounting to "experimental proof"
of Pauli's conjecture.

Pauli's conjecture was the result of experimental evidence that was
completely inexplicable according to the theory of the day: energy and
spin was disappearing from certain nuclear reactions. This was an
experimental result that needed to be explained, and Pauli's solution was
to invent an invisible particle that carried that energy and spin away.

(When I put it like that, it sounds stupid, but in fact it was an elegant
and powerful answer to the problem.)

The neutrino wasn't something that Pauli invented from theoretical first
principles. It came out of hard experimental results.

Physics of the last half century is littered with the half-forgotten
corpses of theoretical particles that never eventuated: gravitinos,
photinos, tachyons, rishons, flavons, hypercolor pre-quarks, axions,

Neutrinos and quarks are exceptional in that experimental predictions of
their existence were correct, and I maintain that is because (unlike all
of the above) they were postulated to explain solid experimental results,
not just to satisfy some theoretical itch.

So yet again, your triumph of theory is actually a victory for experiment.
--
Steven.

Oct 16 '05 #28
"Steven D'Aprano" <st***@REMOVETHIScyber.com.au> wrote in message
news:pa****************************@REMOVETHIScybe r.com.au...
On Sun, 16 Oct 2005 15:16:39 +0200, Christian Stapfer wrote:
Come to think of an experience that I shared
with a student who was one of those highly
creative experimentalists you seem to have
in mind. He had just bought a new PC and
wanted to check how fast its floating point
unit was as compared to our VAX. After
having done his wonderfully creative
experimenting, he was utterly dejected: "Our (old)
VAX is over 10'000 times faster than my new PC",
he told me, almost in despair.
Which it was. It finished executing his code in almost 1/10,000th of the
time his PC could do.
Whereupon I,
always the uncreative, dogmatic theoretician,
who does not believe that much in the decisiveness
of the outcome of mere experiments, told him
that this was *impossible*, that he *must* have

It wasn't a mistake and it did happen.

Yes, yes, of course, it was a mistake, since
the conclusion that he wanted to draw from
this experiment was completely *wrong*.
Similarly, blind experimentalism *without*
supporting theory is mostly useless.
The VAX finished the calculation
10,000 times faster than his PC.
You have a strange concept of "impossible".
What about trying, for a change, to suppress
to quite unnecessarily long exchanges in this
NG.
It turned out that the VAX compiler had been
clever enough to hoist his simple-minded test
code out of the driving loop.
But, mind you, his test was meant to determine,
*not* the cleverness of the VAX compiler *but*
the speed of the floating-point unit. So his
experiment was a complete *failure* in this regard.

Optimizations have a tendency to make a complete mess of Big O
calculations, usually for the better. How does this support your
theory that Big O is a reliable predictor of program speed?
My example was meant to point out how
problematic it is to assume that experimental
outcomes (without carefully relating them
back to supporting theory) are quite *worthless*.
This story was not about Big-Oh notation but
a cautionary tale about the relation between
experiment and theory more generally.
- Got it now?
For the record, the VAX 9000 can have up to four vector processors each
running at up to 125 MFLOPS each, or 500 in total. A Pentium III runs at
about 850 Mflops. Comparing MIPS or FLOPS from one system to another is
very risky, for many reasons, but as a very rough and ready measure
of comparison, a four processor VAX 9000 is somewhere about the
performance of a P-II or P-III, give or take some fudge factor.
Well, that was in the late 1980s and our VAX
certanly most definitely did *not* have a
vector processor: we were doing work in
industrial automation at the time, not much
number-crunching in sight there.
So, depending on when your student did this experiment, it is entirely
conceivable that the VAX might have been faster even without the
optimization you describe.
Rubbish. Why do you want to go off a tangent like
this? Forget it! I just do not have the time to
start quibbling again.
Of course, you haven't told us what model VAX,
That's right. And it was *not* important. Since the
tale has a simple moral: Experimental outcomes
*without* supporting theory (be it of the Big-Oh
variety or something else, depending on context)
is mostly worthless.
so this comparison might not be relevant.
Your going off another tangent like this is
certainly not relevant to the basic insight
that experiments without supproting theory
are mostly worhtless, I'd say...
In fact, our VAX
calculated the body of the loop only *once*
and thus *immediately* announced that it had finished
the whole test - the compiler on this student's
PC, on the other hand, had not been clever enough
for this type of optimization: hence the difference...

Precisely. And all the Big O notation is the world will not tell you that.
Only an experiment will. Now, perhaps in the simple case of a bare loop
doing the same calculation over and over again, you might be able to
predict ahead of time what optimisations the compiler will do. But for
more complex algorithms, forget it.

This is a clear case of experimentation leading to the discovery
of practical results which could not be predicted from Big O calculations.

The only problem being: it was *me*, basing
myself on "theory", who rejected the "experimental
result" that the student had accepted *as*is*.
(The student was actually an engineer, I myself
had been trained as a mathematician. Maybe that
rings a bell?)
I find it quite mind-boggling that you would use as if it was a triumph
of abstract theoretical calculation when it was nothing of the sort.
This example was not at all meant to be any
such thing. It was only about: "experimenting
*without* relating experimental outcomes to
theory is mostly worthless". What's more:
supporting theory is also mostly worthless.
I think this is really a cautionary tale for
experimentalists: don't *believe* in the decisiveness
of the outcomes your experiments, but try to *understand*
of the situation)...

Or, to put it another way: your student discovered

No. You didn't read the story correctly.
The student had accepted the result of
his experiments at face value. It was only
because I had "theoretical" grounds to reject
that experimental outcome that he did learn
something in the process.
Why not, for a change, be a good loser?
something by running an experimental test of his code
that he would never have learnt in a million
years of analysis of his algorithm: the VAX compiler
was very cleverly optimized.

Ok, he did learn *that*, in the end. But he
did *also* learn to thoroughly mistrust the
outcome of a mere experiment. Experiments
(not just in computer science) are quite
frequently botched. How do you discover
botched experiments? - By trying to relate
experimental outcomes to theory.

Regards,
Christian
Oct 16 '05 #29
"Steven D'Aprano" <st***@REMOVETHIScyber.com.au> wrote in message
news:pa****************************@REMOVETHIScybe r.com.au...
On Sun, 16 Oct 2005 19:42:11 +0200, Christian Stapfer wrote:
Pauli's prediction of
the existence of the neutrino is another. It took
experimentalists a great deal of time and patience
(about 20 years, I am told) until they could finally
muster something amounting to "experimental proof"
of Pauli's conjecture.
Pauli's conjecture was the result of experimental evidence that was
completely inexplicable according to the theory of the day:

So was it mere experiment or was it the relation
between experiment and theory that provided
the spur for creative advancement? My position
is the latter. Mere experiment does not tell you
anything at all. Only experiment on the background
of suitable theory does that.
energy and
spin was disappearing from certain nuclear reactions. This was an
experimental result that needed to be explained, and Pauli's solution was
to invent an invisible particle that carried that energy and spin away.
Pauli's creativity lay in proposing *this*
particular solution to the puzzle. And, surely,
if it had not been for Pauli's characterization
of that hypothetical particle, experimentalists
like Cowan and Reines would not have *anything*
to aim for in the first place.

But I'm not going to argue Pauli's case any futher
in this NG, because this is, in the end,
not a physics NG...
(When I put it like that, it sounds stupid, but in fact it was an elegant
and powerful answer to the problem.)

The neutrino wasn't something that Pauli invented from theoretical first
principles. It came out of hard experimental results.

Physics of the last half century is littered with the half-forgotten
corpses of theoretical particles that never eventuated: gravitinos,
photinos, tachyons, rishons, flavons, hypercolor pre-quarks, axions,

Neutrinos and quarks are exceptional in that experimental predictions of
their existence were correct, and I maintain that is because (unlike all
of the above) they were postulated to explain solid experimental results,
not just to satisfy some theoretical itch.

So yet again, your triumph of theory is actually a victory for experiment.

Well, I might tell now the story of Maxwell,
sitting in his garden - and deducing, from
his equations (which, admittedly, were inspired
by earlier experimental work by Faraday),
something really quite shockingly *new*:
the existence of electromagnetic waves.

Regards,
Christian
--
»From a long view of the history of mankind -
seen from, say, ten thousand years from now
- there can be little doubt that the most
significant event of the nineteenth century
will be judged as Maxwell's discovery of the
laws of electrodynamics. The American Civil
War will pale into provincial insignificance
in comparison with this important scientific
- Richard P. Feynman: "The Feynman Lectures"
Oct 16 '05 #30
Christian Stapfer wrote:
Christian Stapfer wrote:
Christian Stapfer wrote:

>This discussion begins to sound like the recurring
>arguments one hears between theoretical and
>experimental physicists. Experimentalists tend
>to overrate the importance of experimental data
>(setting up a useful experiment, how to interpret
>the experimental data one then gathers, and whether
>one stands any chance of detecting systematic errors
>of measurement, all depend on having a good *theory*
>in the first place). Theoreticians, on the other hand,
>tend to overrate the importance of the coherence of
>theories. In truth, *both* are needed: good theories
>*and* carefully collected experimental data.
>
>Regards,
>Christian

An interesting parallel can be made concerning management of production
vs
management of creativity.

In general, production needs checks and feedback to insure quality, but
will often come to a stand still if incomplete resources are available.

Where as creativity needs checks to insure production, but in many cases
can still be productive even with incomplete or questionable resources.
The quality may very quite a bit in both directions, but in creative
tasks, that is to be expected.

In many ways programmers are a mixture of these two. I think I and
Steven
use a style that is closer to the creative approach. I get the feeling
your background may be closer to the production style.
This diagnosis reminds me of C.G. Jung, the psychologist,
who, after having introduced the concepts of extra- and
introversion, came to the conclusion that Freud was
an extravert whereas Adler an introvert. The point is
that he got it exactly wrong...

As to the value of complexity theory for creativity
in programming (even though you seem to believe that
a theoretical bent of mind can only serve to stifle
creativity), the story of the discovery of an efficient
string searching algorithm by D.E.Knuth provides an
interesting case in point. Knuth based himself on
seemingly quite "uncreatively theoretical work" (from
*your* point of view) that gave a *better* value for
the computational complexity of string searching
than any of the then known algorithms could provide.

Regards,
Christian
(even though you seem to believe that

a theoretical bent of mind can only serve to stifle
creativity)

No, that is not at all what I believe. What I believe is, "The insistence
of strict conditions can limit creative outcomes."

That's agreed. But going off *blindly*experimenting*
without trying to relate the outcome of that experimenting
back to ones theoretical grasp of the work one is doing
is *not* a good idea. Certainly not in the long run.
In fact, muddling-trough and avoiding the question
of suitable theoretical support for one's work is
perhaps more typical of production environments.

The lack of those limits does not prevent one from using any resources
(including theoretical ones) if they are available.

You seem to be rejecting experimental results in your views.

my posts in this thread and are simply projecting wildly,
as psychoanalysts would call it, that is all.

The term 'rejecting' was the wrong word in this case. But I still get
the impression you don't trust experimental methods.

As it appears, not even my most recent post has had
*any* recognizable effect on your thoroughly
misapprehending my position.

Regards,
Christian

In most cases being able to see things from different view points is
good. So I was offering an additional view point, not trying to
implying your's is less correct.

On a more practical level, Python as a language is a dynamic development
process. So the level of completeness of the documentation, and the
language it self, will vary a bit in some areas compared to others. So
as a programmer, it is often much more productive for me to try
something first and then change it later if it needs it. Of course I
would test it with a suitable range of data that represents the expected
range at some point.

In any case, this view point has already been expressed I think. <shrug>

Cheers,
Ron

Oct 16 '05 #31
Steven D'Aprano <st***@removethiscyber.com.au> wrote:
On Sun, 16 Oct 2005 15:16:39 +0200, Christian Stapfer wrote:
It turned out that the VAX compiler had been
clever enough to hoist his simple-minded test
code out of the driving loop.

Optimizations have a tendency to make a complete mess of Big O
calculations, usually for the better. How does this support your
theory that Big O is a reliable predictor of program speed?

There are many things that you cannot predict, however if the compiler was sufficiently documented and you had the
knowledge of the abovementioned peculiarity/optimization, you could take it into account. Bear in mind that the example
was given to show a problem with a purely experimental approach - it tends to show a tree and ignore the forest.
Sometimes this tree can be respresentative of a forest but many times it might not be.

The way I understood these notations was in terms of algorithmic behavior and data input sizes. It is generally
expected that certain basic operations will have a certain complexity. If this is not the case on a
particular platform (language, interpreter, cpu etc.) then there are several questions to ask: a) is such a deviation
documented?, b) why is there such a deviation in the first place? For example, if something is generally known to be
O(1) and your particular platform makes it O(n) then you have to ask why that is so. There might be a
perfectly good reason but this should still either be obvious or documented.

IMHO, I would rather first explore the theoretical boundaries to a certain approach before wasting time on coding up
stuff. If it is immediately obvious that such an approach will not yield anything acceptable for my own purposes then
what is the point of squeezing performance out of what will be dead-beat code anyways?

Knowing the tool/language/os in depth is a formidable strength and it is always a pleasure to see someone squeeze time
out of a piece of code solely based on knowing the internals of the compiler and/or runtime environment. However, it is
most usually the case that this person will be squeezing time out of a certain order of performance - no amount of
this kind of optimization will move the code to the next order.

Ognen
Oct 16 '05 #32
Ognen Duzlevski <ma****@norge.freeshell.org> writes:
Optimizations have a tendency to make a complete mess of Big O
calculations, usually for the better. How does this support your
theory that Big O is a reliable predictor of program speed?
There are many things that you cannot predict, however if the
compiler was sufficiently documented and you had the knowledge of
the abovementioned peculiarity/optimization, you could take it into
account. Bear in mind that the example was given to show a problem
with a purely experimental approach - it tends to show a tree and
ignore the forest. Sometimes this tree can be respresentative of a
forest but many times it might not be.

Consider the claim that earlier in the thread that adding to a hash
table is approximately O(1):

[Stephen D'Aprano] And knowing that hash tables are O(1) will not tell you that, will it?

There is only one practical way of telling: do the experiment. Keep
loading up that hash table until you start getting lots of collisions.

The complexity of hashing depends intricately on the the data and if
the data is carefully constructed by someone with detailed knowledge
of the hash implementation, it may be as bad as O(n) rather than O(1)
or O(sqrt(n)) or anything like that. Experimentation in the normal
will not discover something like that. You have to actually
understand what's going on. See for example:

http://www.cs.rice.edu/~scrosby/hash/
Oct 16 '05 #33
On Sun, 16 Oct 2005 20:28:55 +0200, Christian Stapfer wrote:
Experiments
(not just in computer science) are quite
frequently botched. How do you discover
botched experiments?

Normally by comparing them to the results of other experiments, and being
unable to reconcile the results. You may have heard the term "the
experiment was/was not replicable".

How do you discover whether your theory is correct? By comparing it to
itself, or by comparing it to experiment?

Of course you need some theoretical background in order to design your
experiment in the first place, otherwise you have no idea what you are
even looking for. And it is easy to mis-design an experiment, as your
student did, so that it is actually measuring X when you think it is
measuring Y. If you are about to argue against a naive view of the
scientific method where the scientist generates data in a mental vacuum,
don't bother, I understand that.

But fundamentally, calculated Big O values of algorithms on their own are
of virtually zero use in choosing between actual functions. If I tell you
that algorithm X is O(f(N)), what have I told you about it?

Have I told if it is unacceptably slow for some particular data size? No.

Have I told you how much work it will take to implement it? No.

Have I told you how fast my implementation is? No.

Have I told you that it is faster than some other algorithm Y? No.

The only thing I have told you is that in some rough and ready fashion, if
I increase the size of my data N, the amount of work done will increase
very approximately like f(N).

This disconnect between what Big O *actually* means and how you are
recommending we use it makes REAL PRACTICAL DIFFERENCE, and not in a good
way.

Here is a real problem I had to solve some time ago. Without using
regular expressions, I needed to find the position of a target
string in a larger string, but there were multiple targets. I
needed to find the first one.

I ended up using something like:

(untested, and from memory)

def findmany(s, *targets):
minoffset = (len(s)+1, "")
for t in targets:
p = s.find(t)
if p != -1 and p < minoffset[0]:
minoffset = (p, t)
return minoffset[0]

You would look at that, realise that s.find() is O(N) or even O(N**2) -- I
forget which -- and dismiss this as Shlemiel the Painter's algorithm which
is O(N**2) or worse.

http://www.joelonsoftware.com/articl...000000319.html

And you would be right in your theory -- but wrong in your conclusion.
Because then you would do what I did, which is spend hours or days writing
a "more efficient" O(N) searching function in pure Python, which ended up
being 100 times slower searching for a SINGLE target string than my
Shlemiel algorithm was searching for twenty targets. The speed benefit I
got from pushing the character matching from Python to C was so immense
that I couldn't beat it no matter how "efficient" the algorithm was.

If I had bothered to actually profile my original code, I would have
discovered that for any data I cared about, it worked not just acceptably
fast, but blindingly fast. Why would I care that if I had a terrabyte of
data to search for a million targets, it would scale badly? I was
searching text strings of less than a megabyte, for five or six targets.
The Big O analysis I did was completely, 100% correct, and completely,
100% useless. Not just useless in that it didn't help me, but it actually
hindered me, leading me to waste a day's work needlessly looking for a
"better algorithm".

--
Steven.

Oct 16 '05 #34
On Sun, 16 Oct 2005 14:07:37 -0700, Paul Rubin wrote:
The complexity of hashing depends intricately on the the data and if
the data is carefully constructed by someone with detailed knowledge
of the hash implementation, it may be as bad as O(n) rather than O(1)
or O(sqrt(n)) or anything like that. Experimentation in the normal
will not discover something like that. You have to actually
understand what's going on. See for example:

Yes, that is a very good point, and I suppose if a hostile user wanted to
deliberately construct a data set that showed off your algorithm to its
worst behaviour, they might do so. But if you are unlikely to discover
this worst case behaviour by experimentation, you are equally unlikely to
discover it in day to day usage. Most algorithms have "worst case"
behaviour significantly slower than their best case or average case, and
are still perfectly useful.
--
Steven.

Oct 16 '05 #35
Steven D'Aprano wrote:
On Sat, 15 Oct 2005 18:17:36 +0200, Christian Stapfer wrote:

I'd prefer a (however) rough characterization
of computational complexity in terms of Big-Oh
(or Big-whatever) *anytime* to marketing-type
characterizations like this one...

Oh how naive.

Why is it that even computer science undergrads
are required to learn the basics of Big-Oh and
all that?

So that they know how to correctly interpret what Big O notation means,
instead of misinterpreting it. Big O notation doesn't tell you everything
you need to know to predict the behaviour of an algorithm. It doesn't even
tell you most of what you need to know about its behaviour. Only actual
*measurement* will tell you what you need to know.

In my experience, I need both knowledge of algorithmic
complexity (in some pragmatic sense) and measurements.
Neither alone is sufficient.

The proponents of algorithmic complexity measures don't
make the mistake of thinking that constants don't matter
for real-world performance, but they also don't make the
mistake of thinking that you can always measure enough
to tell you how your code will perform in all situations
in which it might be used.

Measurement is complicated -- very often, it just shows
you that tuning to match cache sizes is greatly important
to keep the constant factors down. And yes, for small
data sizes often a linear-time algorithm can beat one
whose execution time grows only logarithmically, while
often a logarithmic time is close enough to constant over
the range of interest. How an operation runs on a heavily
can also be greatly different from what microbenchmarks
might suggest.

If we don't oversimplify, we'll measure some appropriate
performance numbers and combine that with some knowledge
of the effects of caches, algorithmic complexity and other
factors that might matter in given situations. And of
course there will be many situations where programmer time
and simplicity are more important than saving a millisecond,
or even a second, and we won't waste excessive resources in
optimising runtime at the expense of other factors.

-- James
Oct 17 '05 #36
Steven D'Aprano <st***@REMOVETHIScyber.com.au> writes:
But if you are unlikely to discover this worst case behaviour by
experimentation, you are equally unlikely to discover it in day to
day usage.
Yes, that's the whole point. Since you won't discover it by
experimentation and you won't discover it by day to day usage, you may
very well only find out about it when an attacker clobbers you. If
you want to prevent that, you HAVE to discover it by analysis, or at
least do enough analysis to determine that a successful attack won't
cause you a big catastrophe (ok, this is probably the case for most of
the stuff that most of us do).

Sure, there are some applications that are never exposed to hostile
users. That excludes pretty much anything that connects to the
internet or handles data that came from the internet. Any general
purpose development strategy that doesn't take hostile users into
account is of limited usefulness.
Most algorithms have "worst case" behaviour significantly slower
than their best case or average case, and are still perfectly useful.

Definitely true. However, a lot more of the time than many
implementers seem to think, you have to take the worst case into
account. There's no magic bullet like "experiments" or "unit tests"
that results in reliable software. You have to stay acutely aware of
what you're doing at every level.
Oct 17 '05 #37
Christian Stapfer <ni*@dev.nul> wrote:
This is why we would like to have a way of (roughly)
estimating the reasonableness of the outlines of a
having to write any code and/or test harness.

And we would also like to consume vast amounts of chocolate, while
similarly reclining in comfortable armchairs, without getting all fat
and flabby. Unfortunately, what we would like and what reality affords
are often pretty uncorrelated. No matter how much theoreticians may
love big-O because it's (relatively) easy to compute, it still has two
failings which are often sufficient to rule out its sufficiency for any
"estimate [of] the reasonableness" of anything: [a] as we operate on
finite machines with finite wordsize, we may never be able reach
anywhere even remotely close to the "asymptotic" region where big-O has
some relationship to reality; [b] in many important cases, the
theoretical worst-case is almost impossible to characterize and hardly
ever reached in real life, so big-O is of no earthly use (and much
harder to compute measures such as big-Theta should be used for just

Consider, for example, point [b]. Quicksort's big-O is N squared,
suggesting that quicksort's no better than bubblesort or the like. But
such a characterization is absurd. A very naive Quicksort, picking its
pivot very systematically (e.g., always the first item), may hit its
worst case just as systematically and in cases of practical importance
(e.g., already-sorted data); but it takes just a little extra care (in
the pivot picking and a few side issues) to make the worst-case
occurrences into ones that will not occur in practice except when the
input data has been deliberately designed to damage by a clever and

Designing based on worst-case occurrences hardly ever makes sense in any
field of engineering, and blind adherence to worst-case assessments can
be an unmitigated disaster, promoting inferior technology just because,
in the WORST imaginable case, the best available technology would fare
no better than the inferior one (even though in 99.99999% of cases the
best technology would perform better, if you're designing based on
worst-case analyses you may not even NOTICE that -- and NEVER, *NEVER*
forget that big-O is nothing BUT "extreme-worst-case" analysis!). Why
bother using prestressed concrete, when, should a large asteroid score a
direct hit, the costly concrete will stand up no better than cheap
bricks, or, for that matter, slightly-damp straw? Why bother doing
(e.g.) random pivot selection in quicksort, when its big-O (i.e.,
worst-case) behavior will remain N-squared, just like naive quicksort,
or, for that matter, bubblesort?
Alex
Oct 17 '05 #38
"Alex Martelli" <al*****@yahoo.com> wrote in message
news:1h4knvy.41r7fw1k4bge7N%al*****@yahoo.com...
Christian Stapfer <ni*@dev.nul> wrote:
This is why we would like to have a way of (roughly)
estimating the reasonableness of the outlines of a
having to write any code and/or test harness.
And we would also like to consume vast amounts of chocolate, while
similarly reclining in comfortable armchairs,

Maybe some of my inclination towards design
based on suitable *theories* (instead of
self-conditioning through testing) goes back
to the fact that I tend to think about the
design of my programs when no computer happens
to be near at hand to do some such experimenting,
or self-conditioning...
without getting all fat and flabby.
Well, thinking can be hard work. There is no need
to suggest an image of laziness. Thought experiments
are also quite often successful. Hardware engineers
can design very often entire gadgets without doing
a great deal of testing. They usually need to resort
to testing only if they know (or feel?) not to have
a sufficiently clear *theoretical* grasp of the
behavior of some part of their design.
Unfortunately, what we would like and what reality affords
are often pretty uncorrelated. No matter how much theoreticians may
love big-O because it's (relatively) easy to compute, it still has two
failings which are often sufficient to rule out its sufficiency for any
"estimate [of] the reasonableness" of anything: [a] as we operate on
finite machines with finite wordsize, we may never be able reach
anywhere even remotely close to the "asymptotic" region where big-O has
some relationship to reality; [b] in many important cases, the
theoretical worst-case is almost impossible to characterize and hardly
ever reached in real life, so big-O is of no earthly use (and much
harder to compute measures such as big-Theta should be used for just
But the fact remains that programmers, somewhat
experienced with the interface a module offers,
have a *rough*idea* of that computational complexity
attaches to what operations of that interface.
And having such a *rough*idea* helps them to
design reasonably performing programs much more
quickly.
Big-Oh and other asymptotic complexity measures
really do have *this* advantage over having
acquired, by way of conditioning experiences,
some such *rough*idea* of computational complexity:
they capture at least some of that "rough idea"
in a somewhat more easily communicable and much
more precise fashion.

Maybe you and Steven prefer to be conditioned,
Pavlov style, by the wonderful experiences that
you get while testing? - This is perhaps really
one of my *worst* psychological handicaps, I must
admit: that I don't *like* to get conditioned
like that, no matter how good it feels, no matter
how effective it might be for "practical" work that
one has to do.
I want to be able to really think *about* what
I am doing. And in order to be able to think about
it one usually needs some information about the
implementation, performance wise, of the language
features and the system modules that one might
want to use. If you happen to know of any *better*
way of offering the requisite information than
asymptotic complexity measures then, of course,
I am very grateful to hear more about it.
Consider, for example, point [b]. Quicksort's big-O is N squared,
suggesting that quicksort's no better than bubblesort or the like. But
such a characterization is absurd. A very naive Quicksort, picking its
pivot very systematically (e.g., always the first item), may hit its
worst case just as systematically and in cases of practical importance
(e.g., already-sorted data); but it takes just a little extra care (in
the pivot picking and a few side issues) to make the worst-case
occurrences into ones that will not occur in practice except when the
input data has been deliberately designed to damage by a clever and

Designing based on worst-case occurrences hardly ever makes
sense in any field of engineering,
What's wrong with wanting to have a rough idea
of what might happen in the worst case? I believe
many engineers are actually expected to think
about at least some "worst-case" scenarios.
Think of nuclear reactors, airplanes, or
telephone exchanges (and dont' think of Google
for a change). Don't you expect engineers
and scientists designing, for example, a nuclear
reactor, to think hard about what the worst-case
scenario might be? And how likely it might happen?
(And *no* testing whatsoever in that direction,

<snip/>
Why bother doing
(e.g.) random pivot selection in quicksort, when its big-O (i.e.,
worst-case) behavior will remain N-squared, just like naive quicksort,
or, for that matter, bubblesort?

Because worst-case is not the only measure of
computational complexity that one might be
interested in. For some applications one may
be able to accept relatively bad worst-case
behavior, if it doesn't happen too often.
This is why for these people we might provide
information about average case behavior (and
then the difference between quicksort and
bubblesort clearly shows).
For others, such worst-case behavior may not
be acceptable. For those other applications,
a good worst-case may be what is required.
This is why this second category of programmers
needs to know about the worst-case. - But I am
certainly belaboring the obvious...

Of course, the programs that have been designed
on the basis of such information can (and ought
to be) tested. Then some surprises (*not* predicted
by theory) might happen: but if they do not happen
too often, theory has done a good job - and so has
the programmer...

Regards,
Christian
Oct 17 '05 #39
Christian Stapfer <ni*@dev.nul> wrote:
"Alex Martelli" <al*****@yahoo.com> wrote in message
news:1h4knvy.41r7fw1k4bge7N%al*****@yahoo.com...
Christian Stapfer <ni*@dev.nul> wrote:
This is why we would like to have a way of (roughly)
estimating the reasonableness of the outlines of a
having to write any code and/or test harness.
And we would also like to consume vast amounts of chocolate, while
similarly reclining in comfortable armchairs,

Maybe some of my inclination towards design
based on suitable *theories* (instead of
self-conditioning through testing) goes back
to the fact that I tend to think about the
design of my programs when no computer happens
to be near at hand to do some such experimenting,
or self-conditioning...

Oh, I am as prone as anybody I know to do SW architecture and design in
bed when the lights are off and I'm sliding into sleep -- just about the
only case in which no computer is handy, or, rather, in which it's
generally unwise to turn the computer on (since it would interfere with
the sleep thing;-). Back before laptops were really affordable and
usable, I used to have a long bus commute, and did a lot of design with
pen and paper; and whiteboards are a popular group-design tool at
Google, no matter how many laptops or desktops happen to be around --
whiteboards are simply more suitable for "socialization" around a draft
design's sketch, than any computer-based tool I've ever seen.

But that's *design*, and most often in pretty early stages, too -- quite
a ways from *coding*. At that stage, one doesn't even generally commit
to a specific programming language or other for the eventual
implementation of the components one's considering! Rough ideas of
*EXPECTED* run-times (big-Theta) for various subcomponents one is
sketching are *MUCH* more interesting and important than "asymptotic
worst-case for amounts of input tending to infinity" (big-O) -- for
example, where I sketch-in (mentally, on paper, or on whiteboard) a
"hash table" subcomponent, I consider the *expected* (Theta) performance
(constant-time lookups), definitely NOT the big-O "linear time" lookups
which just MIGHT occur (if, say, all inputs just happened to hash to the
same value)... otherwise, I'd never use hash tables, right?-)

without getting all fat and flabby.

Well, thinking can be hard work. There is no need
to suggest an image of laziness. Thought experiments
are also quite often successful. Hardware engineers
can design very often entire gadgets without doing
a great deal of testing. They usually need to resort
to testing only if they know (or feel?) not to have
a sufficiently clear *theoretical* grasp of the
behavior of some part of their design.

Having been a hardware designer (of integrated circuits, for Texas
Instruments, and later briefly for IBM), before switching to software, I
can resolutely deny this assertion: only an utter madman would approve a
large production run of an IC who has not been EXTENSIVELY tested, in
simulations and quite possibly in breadboards and later in limited
pre-production runs. And any larger "gadget" USING ICs would be
similarly crazy to skimp on prototyping, simulation, and other testing
-- because, as every HW engineer KNOWS (SW ones often have to learn the
hard way), the distance between theory and practice, in practice, is
much larger than the distance between practice and theory should be in
theory;-).

Unfortunately, what we would like and what reality affords
are often pretty uncorrelated. No matter how much theoreticians may
love big-O because it's (relatively) easy to compute, it still has two
failings which are often sufficient to rule out its sufficiency for any
"estimate [of] the reasonableness" of anything: [a] as we operate on
finite machines with finite wordsize, we may never be able reach
anywhere even remotely close to the "asymptotic" region where big-O has
some relationship to reality; [b] in many important cases, the
theoretical worst-case is almost impossible to characterize and hardly
ever reached in real life, so big-O is of no earthly use (and much
harder to compute measures such as big-Theta should be used for just

But the fact remains that programmers, somewhat
experienced with the interface a module offers,
have a *rough*idea* of that computational complexity
attaches to what operations of that interface.
And having such a *rough*idea* helps them to
design reasonably performing programs much more
quickly.

A rough idea helps, particularly a rough idea of EXPECTED performance.
Big-Oh and other asymptotic complexity measures
really do have *this* advantage over having
acquired, by way of conditioning experiences,
some such *rough*idea* of computational complexity:
No: big-O is NOT AT ALL a measure, rough or otherwise, of EXPECTED
performance. It's *WORST-CASE*, by definition; and includes no
indications whatsoever of how close to the asymptote one can actually
get on a given finite-size machine. By both issues, it can be totally
misleading -- and, "it's not what you don't know, that hurts... it's
what you know WHICH IS NOT SO". By its MISLEADING characteristics,
big-O can be SERIOUSLY DAMAGING to the abilty of "designing on the back
they capture at least some of that "rough idea"
in a somewhat more easily communicable and much
more precise fashion.
Entirely precise, and therefore, in such cases as quicksort and hash
tables (hardly "obscure corner cases" -- CENTRAL PILLARS of many

Maybe you and Steven prefer to be conditioned,
Pavlov style, by the wonderful experiences that
you get while testing? - This is perhaps really
one of my *worst* psychological handicaps, I must
admit: that I don't *like* to get conditioned
like that, no matter how good it feels, no matter
how effective it might be for "practical" work that
one has to do.
I like to be able to reason FIRSTLY on the basis of EXPECTED, NORMAL
behavior, corrected in the first order by reasonable prudence and
caution, in the second order by accurate experimentation, and only in
the third order by considerations of worst-case scenarios -- the latter
normally tempered by estimates of their likelihood... which also
requires a rough understanding of CORRELATION between the causes of such
scenarios, which may be present, both directly or indirectly (assuming
that different components interacting in a system "may just happen" to
hit worst-case scenarios for each of them at once may be demonstrably
wrong in either direction -- and NO big-O analysis will ever help with
I want to be able to really think *about* what
I am doing. And in order to be able to think about
it one usually needs some information about the
implementation, performance wise, of the language
features and the system modules that one might
want to use. If you happen to know of any *better*
way of offering the requisite information than
asymptotic complexity measures then, of course,
I am very grateful to hear more about it.
If you can't think about a design without knowing such details about one
specific implementation of one specific programming language in which
that design might get implemented, I think this strongly suggests you're
giving insufficient attention to the one crucial skill in a designer's
mental armory: *ABSTRACTION*. There are many ways to cultivate one's
abstraction abilities. One of my favorite pastimes is collecting
different renditions of Bach's "Art of the Fugue", for example: Bach
carefully abstracted away the information about which instrument(s) were
supposed to be playing which voice(s), as well as many other details --
because of this, the Art of the Fugue has the highest abstraction level
among all well-known musical compositions, and each rendition may take a
different concrete reading of it. Learn to listen to them and be able
to tell what they have in common, and where they differ: it's a great
lesson in the understanding of abstraction. But it's probably only
appropriate if you like music, and Baroque in particular; other arts and
discipline may no doubt afford similarly good learning experiences, if
you know where to look for them.

continue by the exercise of *characterization* (of A FEW
implementations, ideally) which Jon Bentley (the programmer, not the
jazz player) suggests pretty close to the start of his masterpiece
"Writing Efficient Programs" (that great book may be hard to come by
these days, but the "Programming Pearls" books are also excellent and
communicate mostly the same messages, quite effectively). Get a sense
for the *EXPECTED* (NOT "asymptotic", for your inputs will NOT tend to
infinity; NOT "worst-case only", don't be THAT pessimistic) behavior of
some primitive operations - the couple of pages I devote to the subject
in the Nutshell chapter on optimization, profiling and testing may
suffice. Then refine your instinct by taking ONE complicated case, such
as the natural mergesort variant known as the timsort, and delving as
deep into its analysis as you dare -- characterize it as best you can
manage WITHOUT testing, simply by reasoning on its code (Tim's essay,
part of the Python source distribution, will be a helpful addition to a
thorought grounding in Knuth's chapter on sorting, for this purpose)...
then see what a difference it makes, to BE able to experiment!

You'll personally traverse, through such exercises, a curve not too
dissimilar from what the whole of Western thought went through in the
discovery and refinement of the experimental method (to some extent it
can be considered in full bloom in the thoughts and works of Galilei):
not the blind flailing around of pure trial and error (which HAD,
however, proved extremely fruitful in eliciting just about all technical
progress up to that age, and later), much less the ungrounded
elucubration of pure theoreticism (which HAD, mind you, given great
results once in a while, e.g. in Euclid, Nagarjuna, Archimedes...) --
but the powerful, fruitful merging of both strands into the incredibly
productive golden braid which has pulled progress up during the last few
centuries.

Consider, for example, point [b]. Quicksort's big-O is N squared,
suggesting that quicksort's no better than bubblesort or the like. But
such a characterization is absurd. A very naive Quicksort, picking its
pivot very systematically (e.g., always the first item), may hit its
worst case just as systematically and in cases of practical importance
(e.g., already-sorted data); but it takes just a little extra care (in
the pivot picking and a few side issues) to make the worst-case
occurrences into ones that will not occur in practice except when the
input data has been deliberately designed to damage by a clever and

Designing based on worst-case occurrences hardly ever makes
sense in any field of engineering,

What's wrong with wanting to have a rough idea
of what might happen in the worst case? I believe
many engineers are actually expected to think
about at least some "worst-case" scenarios.

Not extrapolating to infinity.
Think of nuclear reactors, airplanes, or
telephone exchanges (and dont' think of Google
for a change). Don't you expect engineers
and scientists designing, for example, a nuclear
reactor, to think hard about what the worst-case
scenario might be? And how likely it might happen?
A square hit by an asteroid of mass tending to infinity? No, I don't
expect nuclear reactors (nor anything else of human conception) to be
designed in consideration of what such an asteroid hit would do. And
yet, that's *EXACTLY* what would be indicated by your theory of big-O as
a guide to design: consider the absolute worst that could conceivably
happen, with *NO* indications WHATSOEVER of how unlikely it might be
(because for simplicity of computation you take limits for misfortune
tending to infinity!!!), and design for THAT.

If our collective ancestors had taken this attitude, we'd still all be
huddling in deep caves (possibly a better protection against "dinosaurs'
killer" levels of asteroid hits!), shivering in the cold (fire is FAR
too dangerous to survive a worst-case analysis, particularly with
damaging elements all tending to infinity, as your love affair with
big-O based designs would certainly indicate!!!). Animal skins? Forget
it!!! Do a perfectly pessimistic worst-case analysis with suitable
extrapolations to infinity and such skins would no doubt carry germs
enough to exterminate the budding human race (not that extinction might
not be preferable to the utterly miserable "lives" the few humans might

(And *no* testing whatsoever in that direction,
I would consider ANYBODY who built a nuclear reactor without AMPLE
testing dangerous enough for all of mankind to shoot on sight. I would
expect HUGE amounts of simulation-based testing, followed and
interspersed by prototypes (smaller and simplified) to validate that the
simulations' tests are indeed perfectly respondent to actual reality.

you're somehow implying that I'm trying to discourage people from
thinking IN USEFUL AND PRODUCTIVE WAYS, I challenge you to point to
anything I wrote here that could be construed that way.

<snip/>
Why bother doing
(e.g.) random pivot selection in quicksort, when its big-O (i.e.,
worst-case) behavior will remain N-squared, just like naive quicksort,
or, for that matter, bubblesort?
Because worst-case is not the only measure of
computational complexity that one might be
interested in. For some applications one may
be able to accept relatively bad worst-case
behavior, if it doesn't happen too often.

But big-O, which is what you advocate, gives *NO* indication of how
likely or unlikely it might be, in particular -- a TERRIBLE failing.
This is why for these people we might provide
information about average case behavior (and
then the difference between quicksort and
bubblesort clearly shows).
Of COURSE you might! Who, pray, is stopping you from so doing, except
perhaps your own laziness and the fact that you prefer to pontificate
constructively participating in the collective effort yourself?

You argued for big-O information, and now you're arguing for more
information (that's much harder to provide in mathematically rigorous
form) regarding "averages" (over WHAT distribution of input
permutations? Why would you think that all permutations are equally
likely? In the real world, they're not, but they ARE incredibly hard to
characterize -- you'd better pick a very specific, tiny subfield of
application for sorting, to be able to supply AMPLE experimental support
for your theories... theory without any experimentation to support it
can be worse than worthless, it can be truly DIS-informing!). Very
well, if you're so convinced this information will be precious (worth
more than, say, the optimizations and new components which people might
alternatively produce, investing as they prefer their own time and
efforts), LEAD BY EXAMPLE. Pick ONE relatively simple issue of
performance, and explore it to the level of breadth and depth you
believe "analytical" (as opposed to *experimental*) performance
characterization should go.

If you're willing to DO SOME WORK rather than SPEND YOUR TIME WHINING,
you will either produce some beautiful results whose practical
importance will stun others into following your lead, or find out that,
in the real world, there are more things in heaven and earth etc --
e.g., that behavior of virtual memory implementations SWAMPS any subtle
theoretical effects you thought would matter, for containers big enough
to matter, and well before getting anywhere close to the "asymptote" of
big-O and perhaps even big-Theta.

One way or another, something useful will be achieved, which surely

For others, such worst-case behavior may not
be acceptable. For those other applications,
a good worst-case may be what is required.
This is why this second category of programmers
needs to know about the worst-case. - But I am
certainly belaboring the obvious...
You are (perhaps without realizing it) pointing out that the big-O
characterization which you originally demanded is basicaly useless to
everybody (considering that a system has several subcomponents, and the
conditions under which each reaches worst-case are NOT necessarily
uncorrelated but might be positively or negatively correlated!). Except
perhaps theoreticians needing to publish theoretical papers, of
course;-).

Of course, the programs that have been designed
on the basis of such information can (and ought
to be) tested. Then some surprises (*not* predicted
by theory) might happen: but if they do not happen
too often, theory has done a good job - and so has
the programmer...

That would follow only if the total amount of work (properly accounting
for the huge amounts needed to collect "theoretically" sound
characterizations) was somehow reduced compared to alternative
approaches, based more on sound engineering practice and less on
reminescences of mathematics.

I believe that, out of all the iron suspension bridges designed and
built in the 19th century, ONE is standing -- the Brooklin Bridge.
Others were designed with "sounder theory" and (althought NOT real
"worst case analysis" -- none considered direct asteroid impacts, nor
did any designer "extrapolate to infinity"!!!-) tried to cover
"reasonable" worst cases as the best theory available to them afforded.
And they all went down in the following decades.

The designer of the Brooklin Bridge followed sound engineering practice:
he designed based on TYPICAL (NOT worst-case) behavior, supported by
small-scale prototypes, THEN, knowing perfectly well that he couldn't be
sure he knew exactly WHAT worst-case he needed to ward about... he
doubled the thickness of all steel ropes and load-bearing beams. So,
HIS bridge is still standing (as is, say, Renzo Piano's airport terminal
building in Japan, where ALL Japanese buildings all around crumbled to
dust in a terrible earthquake... Piano may be an architect rather than
an engineer, but my respect for his craft knows no bounds).

I gather you want nuclear reactors, and programs, designed by the praxis
of all those OTHER suspension bridge designers of the 19th century; I
want them designed by the praxis by which the Brooklin Bridge was
designed. But in this case, you have an excellent chance to prove me
wrong: just put some of your work where your mouth is! And I'll be
quite happy to help, because, even if (as I surmise) the attempt at
theoretical characterization proves pragmatically unsuccessful (in terms
of actual usefulness, and specifically of "bang for the buck"), even
then, if serious work has been put towards it, an empirically important
result is obtained.

I suspect you'll just find half-assed excuses to shirk the work which
your suggestions imply, but for once I'd be happy to be proved wrong...
Alex
Oct 17 '05 #40
al*****@yahoo.com (Alex Martelli) writes:
implementation of the components one's considering! Rough ideas of
*EXPECTED* run-times (big-Theta) for various subcomponents one is
sketching are *MUCH* more interesting and important than "asymptotic
worst-case for amounts of input tending to infinity" (big-O) -- for
I thought big-Theta meant the intersection of big-O (upper bound on
the worst case) and big-Omega (lower bound on the worst case).
example, where I sketch-in (mentally, on paper, or on whiteboard) a
"hash table" subcomponent, I consider the *expected* (Theta) performance
(constant-time lookups), definitely NOT the big-O "linear time" lookups
which just MIGHT occur (if, say, all inputs just happened to hash to the
same value)... otherwise, I'd never use hash tables, right?-)

You really have to be careful about choices like that. See:

http://www.cs.rice.edu/~scrosby/hash/

which I also cited last night.

Exercise: suspend disbelief for a moment and imagine that 1) Google
search works by spidering the web and building a giant hash table of
words that it finds in web pages, to use for servicing future queries;
2) The hash function is similiar to the one used in Python dicts and
is either public knowledge or else leaks out of the company somehow;
and 3) (biggest disbelief suspension of them all) I work for
Microsoft. Question: how could I use knowledge of the hash function
to give Google a hard time?

At least one well known implementer apparently does intend to quit
using hash tables due to considerations like this:

http://cr.yp.to/critbit.html
Oct 17 '05 #41
"Alex Martelli" <al*****@yahoo.com> wrote in message
news:1h4lfof.1jirgzd1rpwucsN%al*****@yahoo.com...
Christian Stapfer <ni*@dev.nul> wrote:
"Alex Martelli" <al*****@yahoo.com> wrote in message
news:1h4knvy.41r7fw1k4bge7N%al*****@yahoo.com...
> Christian Stapfer <ni*@dev.nul> wrote:
>
>> This is why we would like to have a way of (roughly)
>> estimating the reasonableness of the outlines of a
>> having to write any code and/or test harness.
>
> And we would also like to consume vast amounts of chocolate, while
> similarly reclining in comfortable armchairs,
Maybe some of my inclination towards design
based on suitable *theories* (instead of
self-conditioning through testing) goes back
to the fact that I tend to think about the
design of my programs when no computer happens
to be near at hand to do some such experimenting,
or self-conditioning...

Oh, I am as prone as anybody I know to do SW architecture and design in
bed when the lights are off and I'm sliding into sleep -- just about the
only case in which no computer is handy, or, rather, in which it's
generally unwise to turn the computer on (since it would interfere with
the sleep thing;-). Back before laptops were really affordable and
usable, I used to have a long bus commute, and did a lot of design with
pen and paper; and whiteboards are a popular group-design tool at
Google, no matter how many laptops or desktops happen to be around --
whiteboards are simply more suitable for "socialization" around a draft
design's sketch, than any computer-based tool I've ever seen.

But that's *design*, and most often in pretty early stages, too -- quite
a ways from *coding*. At that stage, one doesn't even generally commit
to a specific programming language or other for the eventual
implementation of the components one's considering! Rough ideas of
*EXPECTED* run-times (big-Theta) for various subcomponents one is
sketching are *MUCH* more interesting and important than "asymptotic
worst-case for amounts of input tending to infinity" (big-O) -- for
example, where I sketch-in (mentally, on paper, or on whiteboard) a
"hash table" subcomponent, I consider the *expected* (Theta) performance
(constant-time lookups), definitely NOT the big-O "linear time" lookups
which just MIGHT occur (if, say, all inputs just happened to hash to the
same value)... otherwise, I'd never use hash tables, right?-)

Well, Big-Oh, Big-Theta: these both are asymptotic
complexity measures, arent' they. So that's ok
with me.
> without getting all fat and flabby.

Well, thinking can be hard work. There is no need
to suggest an image of laziness. Thought experiments
are also quite often successful. Hardware engineers
can design very often entire gadgets without doing
a great deal of testing. They usually need to resort
to testing only if they know (or feel?) not to have
a sufficiently clear *theoretical* grasp of the
behavior of some part of their design.

Having been a hardware designer (of integrated circuits, for Texas
Instruments, and later briefly for IBM), before switching to software, I
can resolutely deny this assertion: only an utter madman would approve a
large production run of an IC who has not been EXTENSIVELY tested, in
simulations and quite possibly in breadboards and later in limited
pre-production runs.

Whoa, yet another straw-man attack!
First, remember, I do *not* advocate doing without
testing *altogether*: so here you are just attacking
a straw-man of your own making: that's *not* me.
What I wanted to say here is just that, in my
experience, as a close observer of "average
hardware designers", I believe that their testing
*usually* doesn't turn up major gotchas, and this
can only be because the theory they have about
their hardware will operate, is sufficiently
powerful. - "Usually": which means that exceptions
are possible, of course.
And any larger "gadget" USING ICs would be
similarly crazy to skimp on prototyping, simulation, and other testing
-- because, as every HW engineer KNOWS (SW ones often have to learn the
hard way), the distance between theory and practice, in practice, is
much larger than the distance between practice and theory should be in
theory;-).
> Unfortunately, what we would like and what reality affords
> are often pretty uncorrelated. No matter how much theoreticians may
> love big-O because it's (relatively) easy to compute, it still has two
> failings which are often sufficient to rule out its sufficiency for any
> "estimate [of] the reasonableness" of anything: [a] as we operate on
> finite machines with finite wordsize, we may never be able reach
> anywhere even remotely close to the "asymptotic" region where big-O has
> some relationship to reality; [b] in many important cases, the
> theoretical worst-case is almost impossible to characterize and hardly
> ever reached in real life, so big-O is of no earthly use (and much
> harder to compute measures such as big-Theta should be used for just
But the fact remains that programmers, somewhat
experienced with the interface a module offers,
have a *rough*idea* of that computational complexity
attaches to what operations of that interface.
And having such a *rough*idea* helps them to
design reasonably performing programs much more
quickly.

A rough idea helps, particularly a rough idea of EXPECTED performance.
Big-Oh and other asymptotic complexity measures
really do have *this* advantage over having
acquired, by way of conditioning experiences,
some such *rough*idea* of computational complexity:

No: big-O is NOT AT ALL a measure, rough or otherwise, of EXPECTED
performance.

Another straw-man attack. Do you really read
what others write? What I have written?
Big-Oh is not the real bone of contention
at all, nor is worst-case characterizations,
but the "practical usefulness" of any asymptotic
complexity measures whatsoever.
It's *WORST-CASE*, by definition; and includes no
indications whatsoever of how close to the asymptote one can actually
get on a given finite-size machine. By both issues, it can be totally
misleading -- and, "it's not what you don't know, that hurts... it's
what you know WHICH IS NOT SO". By its MISLEADING characteristics,
big-O can be SERIOUSLY DAMAGING to the abilty of "designing on the back
they capture at least some of that "rough idea"
in a somewhat more easily communicable and much
more precise fashion.
Entirely precise, and therefore, in such cases as quicksort and hash
tables (hardly "obscure corner cases" -- CENTRAL PILLARS of many

So *what* do we substitute for what little "precision"
asymptotic complexity measures can give us? - Just
mindless testing and having us conditioned to accept
marketing-style characterization of modules as "blazingly
fast", or "blindingly fast", or "not that fast"?
Maybe you and Steven prefer to be conditioned,
Pavlov style, by the wonderful experiences that
you get while testing? - This is perhaps really
one of my *worst* psychological handicaps, I must
admit: that I don't *like* to get conditioned
like that, no matter how good it feels, no matter
how effective it might be for "practical" work that
one has to do.

I like to be able to reason FIRSTLY on the basis of EXPECTED, NORMAL
behavior,

That's ok with me, and fits my position
sufficiently well that there really is
no room for much argument.
corrected in the first order by reasonable prudence and
caution, in the second order by accurate experimentation, and only in
the third order by considerations of worst-case scenarios -- the latter
normally tempered by estimates of their likelihood... which also
requires a rough understanding of CORRELATION between the causes of such
scenarios, which may be present, both directly or indirectly (assuming
that different components interacting in a system "may just happen" to
hit worst-case scenarios for each of them at once may be demonstrably
wrong in either direction -- and NO big-O analysis will ever help with
Why on earth are you always jumping on worst-case
analysis and Big-Oh (but think Big-Theta and average-case
just wonderful)? - Steven argued against the "practical
usefulness" of any asymptotic complexity measures whatever.
I want to be able to really think *about* what
I am doing. And in order to be able to think about
it one usually needs some information about the
implementation, performance wise, of the language
features and the system modules that one might
want to use. If you happen to know of any *better*
way of offering the requisite information than
asymptotic complexity measures then, of course,
I am very grateful to hear more about it.

If you can't think about a design without knowing such details about one
specific implementation of one specific programming language in which
that design might get implemented, I think this strongly suggests you're
giving insufficient attention to the one crucial skill in a designer's
mental armory: *ABSTRACTION*.

Another straw-man attack.
What, if anything, are asymptotic complexity measures
doing if not abstract-away a great deal of details?
There are many ways to cultivate one's
abstraction abilities. One of my favorite pastimes is collecting
different renditions of Bach's "Art of the Fugue", for example: Bach
carefully abstracted away the information about which instrument(s) were
supposed to be playing which voice(s), as well as many other details --
because of this, the Art of the Fugue has the highest abstraction level
among all well-known musical compositions, and each rendition may take a
different concrete reading of it. Learn to listen to them and be able
to tell what they have in common, and where they differ: it's a great
lesson in the understanding of abstraction.
Ok, we have something in common here: apparently we
both like Bach's polyphonic music.
However, no matter how much I might like Bach, my
preferred way of "learning abstraction abilities"
was studying mathematics.
But it's probably only
appropriate if you like music, and Baroque in particular; other arts and
discipline may no doubt afford similarly good learning experiences, if
you know where to look for them.

continue by the exercise of *characterization* (of A FEW
implementations, ideally) which Jon Bentley (the programmer, not the
jazz player) suggests pretty close to the start of his masterpiece
"Writing Efficient Programs" (that great book may be hard to come by
these days, but the "Programming Pearls" books are also excellent and
communicate mostly the same messages, quite effectively). Get a sense
for the *EXPECTED* (NOT "asymptotic", for your inputs will NOT tend to
infinity; NOT "worst-case only", don't be THAT pessimistic) behavior of
some primitive operations - the couple of pages I devote to the subject
in the Nutshell chapter on optimization, profiling and testing may
suffice. Then refine your instinct by taking ONE complicated case, such
as the natural mergesort variant known as the timsort, and delving as
deep into its analysis as you dare -- characterize it as best you can
manage WITHOUT testing, simply by reasoning on its code (Tim's essay,
part of the Python source distribution, will be a helpful addition to a
thorought grounding in Knuth's chapter on sorting, for this purpose)...
then see what a difference it makes, to BE able to experiment!
Your referring to Knuth's work in the context of
this discussion (and referring to him in order
to lecture me) seems a rather strange way
of operating.
You'll personally traverse, through such exercises, a curve not too
dissimilar from what the whole of Western thought went through in the
discovery and refinement of the experimental method
Western thought, for me, starts with Greek mathematics
and philosophy, not with Galileos experimentalism. This
is not to say that carefully executed experiments are
not important, far from it.
(to some extent it
can be considered in full bloom in the thoughts and works of Galilei):
not the blind flailing around of pure trial and error (which HAD,
however, proved extremely fruitful in eliciting just about all technical
progress up to that age, and later), much less the ungrounded
elucubration of pure theoreticism (which HAD, mind you, given great
results once in a while, e.g. in Euclid, Nagarjuna, Archimedes...) --
but the powerful, fruitful merging of both strands into the incredibly
productive golden braid which has pulled progress up during the last few
centuries.

> Consider, for example, point [b]. Quicksort's big-O is N squared,
> suggesting that quicksort's no better than bubblesort or the like. But
> such a characterization is absurd. A very naive Quicksort, picking its
> pivot very systematically (e.g., always the first item), may hit its
> worst case just as systematically and in cases of practical importance
> (e.g., already-sorted data); but it takes just a little extra care (in
> the pivot picking and a few side issues) to make the worst-case
> occurrences into ones that will not occur in practice except when the
> input data has been deliberately designed to damage by a clever and
>
> Designing based on worst-case occurrences hardly ever makes
> sense in any field of engineering,
What's wrong with wanting to have a rough idea
of what might happen in the worst case? I believe
many engineers are actually expected to think
about at least some "worst-case" scenarios.

Not extrapolating to infinity.
Think of nuclear reactors, airplanes, or
telephone exchanges (and dont' think of Google
for a change). Don't you expect engineers
and scientists designing, for example, a nuclear
reactor, to think hard about what the worst-case
scenario might be? And how likely it might happen?

A square hit by an asteroid of mass tending to infinity? No, I don't
expect nuclear reactors (nor anything else of human conception) to be
designed in consideration of what such an asteroid hit would do. And
yet, that's *EXACTLY* what would be indicated by your theory of big-O as
a guide to design: consider the absolute worst that could conceivably
happen, with *NO* indications WHATSOEVER of how unlikely it might be
(because for simplicity of computation you take limits for misfortune
tending to infinity!!!), and design for THAT.

If our collective ancestors had taken this attitude, we'd still all be
huddling in deep caves (possibly a better protection against "dinosaurs'
killer" levels of asteroid hits!), shivering in the cold (fire is FAR
too dangerous to survive a worst-case analysis, particularly with
damaging elements all tending to infinity, as your love affair with
big-O based designs would certainly indicate!!!). Animal skins? Forget
it!!! Do a perfectly pessimistic worst-case analysis with suitable
extrapolations to infinity and such skins would no doubt carry germs
enough to exterminate the budding human race (not that extinction might
not be preferable to the utterly miserable "lives" the few humans might

(And *no* testing whatsoever in that direction,

I would consider ANYBODY who built a nuclear reactor without AMPLE
testing dangerous enough for all of mankind to shoot on sight.

I just argued that the actual worst-case scenario
cannot - and should not - be tested (in the case
of nuclear reactors). And I still hold to *that*
assertion ;-)
I would
expect HUGE amounts of simulation-based testing,
Hullo! That's interesting: simulation-based testing.
But that's based on theory. Real experimentalism
substitutes experiment for theory, and therefore
action for thought.
followed and
interspersed by prototypes (smaller and simplified)
Again, here theory is needed: because you have
to use theory to deduce what that scale change
and simplification implies for the real case.
to validate that the
simulations' tests are indeed perfectly respondent to actual reality.

you're somehow implying that I'm trying to discourage people from
thinking IN USEFUL AND PRODUCTIVE WAYS, I challenge you to point to
anything I wrote here that could be construed that way.
The point was that executing (hastily slapped together)
tests - and devoutly believing in the output one gets -
is the way to go. This is foregoing some thinking before
acting, is it not?
<snip/>
> Why bother doing
> (e.g.) random pivot selection in quicksort, when its big-O (i.e.,
> worst-case) behavior will remain N-squared, just like naive quicksort,
> or, for that matter, bubblesort?

Because worst-case is not the only measure of
computational complexity that one might be
interested in. For some applications one may
be able to accept relatively bad worst-case
behavior, if it doesn't happen too often.

But big-O, which is what you advocate,

No, Big-Oh is just one of the several asymptotic
complexity measures under discussion. By pounding
on big-oh and mistaking it for a synonym for
"worst-case" is just attacking a straw-man (not me).
gives *NO* indication of how
likely or unlikely it might be, in particular -- a TERRIBLE failing.
Well, one can augment it with information about
the likelihood of its occurrence. Just because
big-oh (or average case, best case or worst
case) doesn't give you *everything* you would
like to known, doesn't mean it's worth nothing.
This is why for these people we might provide
information about average case behavior (and
then the difference between quicksort and
bubblesort clearly shows).

Of COURSE you might! Who, pray, is stopping you from so doing, except
perhaps your own laziness and the fact that you prefer to pontificate
constructively participating in the collective effort yourself?

I am not pontificating at all (but,
maybe, it is you pontificating here).
I was just suggesting that in order
to have a sufficiently clear idea about
what computational complexity attaches
to what feature of this or that library
module, I would have to read the source
and figure out how it's being done.
But that's not really a very effective
way of operating - not just for me
but for all the many who find themselves
in a similar situation. Note too, that
whoever implements a libary module not
only knows his implementation very well,
but has, by definition, already invested
some amount of thought into the question
of computational complexity of what
his module offers. So why not just write
it down, for a change?
You argued for big-O information, and now you're arguing for more
information (that's much harder to provide in mathematically rigorous
form) regarding "averages" (over WHAT distribution of input
permutations? Why would you think that all permutations are equally
likely? In the real world, they're not, but they ARE incredibly hard to
characterize -- you'd better pick a very specific, tiny subfield of
application for sorting, to be able to supply AMPLE experimental support
for your theories... theory without any experimentation to support it
can be worse than worthless, it can be truly DIS-informing!). Very
well, if you're so convinced this information will be precious (worth
more than, say, the optimizations and new components which people might
alternatively produce, investing as they prefer their own time and
efforts), LEAD BY EXAMPLE. Pick ONE relatively simple issue of
performance, and explore it to the level of breadth and depth you
believe "analytical" (as opposed to *experimental*) performance
characterization should go.

If you're willing to DO SOME WORK rather than SPEND YOUR TIME WHINING,
Ha, ha, another figment of your imagination.
Now I am "whining". Not at all. The very verbosity
of your reaction to my not that long post seems to
indicate that what I had written really did hurt
you - and certainly *not* because it was nonsensical.
Nonsense never hurts anybody - except the one who
is uttering it.
you will either produce some beautiful results whose practical
importance will stun others into following your lead, or find out that,
in the real world, there are more things in heaven and earth etc --
e.g., that behavior of virtual memory implementations SWAMPS any subtle
theoretical effects you thought would matter, for containers big enough
to matter, and well before getting anywhere close to the "asymptote" of
big-O and perhaps even big-Theta.

One way or another, something useful will be achieved, which surely
Let's limit this to the present post...
For others, such worst-case behavior may not
be acceptable. For those other applications,
a good worst-case may be what is required.
This is why this second category of programmers
needs to know about the worst-case. - But I am
certainly belaboring the obvious...

You are (perhaps without realizing it) pointing out that the big-O
characterization which you originally demanded is basicaly useless to
everybody (considering that a system has several subcomponents, and the
conditions under which each reaches worst-case are NOT necessarily
uncorrelated but might be positively or negatively correlated!). Except
perhaps theoreticians needing to publish theoretical papers, of
course;-).

... and Computer Science *undergrads* required to
learn about it? For what terrible fools are
you mistaking their profs?
Of course, the programs that have been designed
on the basis of such information can (and ought
to be) tested. Then some surprises (*not* predicted
by theory) might happen: but if they do not happen
too often, theory has done a good job - and so has
the programmer...

That would follow only if the total amount of work (properly accounting
for the huge amounts needed to collect "theoretically" sound
characterizations)

Certainly, if every programmer has to figure
out such theoretically sound characterizations
from scratch (by reading and analyzing the
relevant library source code) then it's really
going to be a huge amount of work for a huge
number of people. And that was just my point...
was somehow reduced compared to alternative
approaches, based more on sound engineering practice and less on
reminescences of mathematics.

I believe that, out of all the iron suspension bridges designed and
built in the 19th century, ONE is standing -- the Brooklin Bridge.
Others were designed with "sounder theory" and (althought NOT real
"worst case analysis" -- none considered direct asteroid impacts, nor
did any designer "extrapolate to infinity"!!!-) tried to cover
"reasonable" worst cases as the best theory available to them afforded.
And they all went down in the following decades.

The designer of the Brooklin Bridge followed sound engineering practice:
he designed based on TYPICAL (NOT worst-case) behavior, supported by
small-scale prototypes, THEN, knowing perfectly well that he couldn't be
sure he knew exactly WHAT worst-case he needed to ward about... he
doubled the thickness of all steel ropes and load-bearing beams. So,
HIS bridge is still standing (as is, say, Renzo Piano's airport terminal
building in Japan, where ALL Japanese buildings all around crumbled to
dust in a terrible earthquake... Piano may be an architect rather than
an engineer, but my respect for his craft knows no bounds).

I gather you want nuclear reactors, and programs, designed by the praxis
of all those OTHER suspension bridge designers of the 19th century; I
want them designed by the praxis by which the Brooklin Bridge was
designed. But in this case, you have an excellent chance to prove me
wrong: just put some of your work where your mouth is! And I'll be
quite happy to help, because, even if (as I surmise) the attempt at
theoretical characterization proves pragmatically unsuccessful (in terms
of actual usefulness, and specifically of "bang for the buck"), even
then, if serious work has been put towards it, an empirically important
result is obtained.

I suspect you'll just find half-assed excuses to shirk the work which
your suggestions imply, but for once I'd be happy to be proved wrong...

you invested to write this down. However, being
not only something of an "armchair programmer"
but also an "armchair psychologist (if not
psychoanalyst)" I cannot help smiling at this
point...

Regards,
Christian
Oct 18 '05 #42

This thread has been closed and replies have been disabled. Please start a new discussion.