Fun with fancy slicing

Dave Benjamin

Hey all,

I just realized you can very easily implement a sequence grouping function
using Python 2.3's fancy slicing support:

def group(values, size):
return map(None, *[values[i::size] for i in range(size)])

group(range(20), 4) [(0, 1, 2, 3), (4, 5, 6, 7), (8, 9, 10, 11), (12, 13, 14, 15),
(16, 17, 18, 19)]
group(range(14), 3)

[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)]

I had to use map(None, *...) instead of zip(*...) to transpose the result
because zip throws away the "orphans" at the end. Anyway, this is a useful
function to have in your toolkit if you need to do pagination or
multi-column display, among other things...

Anyone have any other interesting applications of the extended slice syntax?

Peace,
Dave

--
..:[ dave benjamin (ramenboy) -:- www.ramenfest.com -:- www.3dex.com ]:.
: d r i n k i n g l i f e o u t o f t h e c o n t a i n e r :

Jul 18 '05 #1

Subscribe Post Reply

2788

Michael Hudson

Dave Benjamin <ra***@lackingtalent.com> writes:

Anyone have any other interesting applications of the extended slice syntax?

You can use it in a sieve prime finder (ooh, thread crossing :-):

/>> def sieve(p):
|.. candidates = range(p)
|.. for i in candidates[2:]:
|.. if not candidates[i]: continue
|.. n = len(candidates[2*i::i])
|.. candidates[2*i::i] = [0]*n
|.. return filter(None, candidates[2:])
\__
->> sieve(50)
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]

but it's not especially efficient (in speed or memory). Quite cute,
though.

Cheers,
mwh

--
A typical luser can perform truly superhuman feats of strength &
dexterity to avoid reading documentation. -- Lionel, asr

Jul 18 '05 #2

Dave Benjamin

In article <7h*************@pc150.maths.bris.ac.uk>, Michael Hudson wrote:

Dave Benjamin <ra***@lackingtalent.com> writes:
Anyone have any other interesting applications of the extended slice syntax?

You can use it in a sieve prime finder (ooh, thread crossing :-):
...
but it's not especially efficient (in speed or memory). Quite cute,
though.

But quite expressive! I'm amazed at how small that code was. It reminds me
of the trademark quicksort implementation in Haskell, ie. it may not be
great in practice, but you can see the algorithm at a high level which makes
it easier to figure out what's going on.

Anyway, thanks. That was cool. =)

Peace,
Dave

--
..:[ dave benjamin (ramenboy) -:- www.ramenfest.com -:- www.3dex.com ]:.
: d r i n k i n g l i f e o u t o f t h e c o n t a i n e r :

Jul 18 '05 #3

Alex Martelli

Dave Benjamin wrote:
...

of the trademark quicksort implementation in Haskell, ie. it may not be
great in practice, but you can see the algorithm at a high level which
makes it easier to figure out what's going on.

Hmmm, you mean as in...:

def quicksort(alist):
head = alist[0]
return quicksort([ x in alist[1:] if x<=head ]) + [
head ] + quicksort([ x in alist[1:] if x>head ])

....?
Alex

Jul 18 '05 #4

David Eppstein

In article <ue**********************@news2.tin.it>,
Alex Martelli <al***@aleax.it> wrote:

Dave Benjamin wrote:
...
of the trademark quicksort implementation in Haskell, ie. it may not be
great in practice, but you can see the algorithm at a high level which
makes it easier to figure out what's going on.

Hmmm, you mean as in...:

def quicksort(alist):
head = alist[0]
return quicksort([ x in alist[1:] if x<=head ]) + [
head ] + quicksort([ x in alist[1:] if x>head ])

...?

Well, except that it gets a syntax error. And if you correct the syntax
it gets an IndexError due to the lack of a base case for the recursion.
And if you correct that, it still doesn't work correctly for lists with
repeated items.

How about:

def quicksort(L):
if len(L) < 2: return L
return quicksort([x for x in L if x < L[0]]) + \
[x for x in L if x == L[0]] + \
quicksort([x for x in L if x > L[0]])

--
David Eppstein http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science

Jul 18 '05 #5

Gerrit Holl

Dave Benjamin wrote:

In article <7h*************@pc150.maths.bris.ac.uk>, Michael Hudson wrote:
Dave Benjamin <ra***@lackingtalent.com> writes:
Anyone have any other interesting applications of the extended slice syntax?

You can use it in a sieve prime finder (ooh, thread crossing :-):
...
but it's not especially efficient (in speed or memory). Quite cute,
though.

I do not seem to have received this message from Michael Hudson through
python-list. I am able to locate it on the newsgroup. Does anyone else
have this problem?

Gerrit Holl.

--
278. If any one buy a male or female slave, and before a month has
elapsed the benu-disease be developed, he shall return the slave to the
seller, and receive the money which he had paid.
-- 1780 BC, Hammurabi, Code of Law
--
Asperger Syndroom - een persoonlijke benadering:
http://people.nl.linux.org/~gerrit/
Het zijn tijden om je zelf met politiek te bemoeien:
http://www.sp.nl/

Jul 18 '05 #6

Alex Martelli

David Eppstein wrote:
...

def quicksort(alist):
head = alist[0]
return quicksort([ x in alist[1:] if x<=head ]) + [
head ] + quicksort([ x in alist[1:] if x>head ])
Well, except that it gets a syntax error. And if you correct the syntax
it gets an IndexError due to the lack of a base case for the recursion.

Right on both counts -- sorry.
And if you correct that, it still doesn't work correctly for lists with
repeated items.

Why not? Having fixed the syntax and added the base-case, I see:

[alex@lancelot perex]$ python -i se.py

import random
x=4*range(5)
random.shuffle(x)
x [4, 4, 3, 1, 1, 1, 1, 2, 0, 0, 3, 4, 3, 4, 2, 0, 3, 0, 2, 2] quicksort(x) [0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4]

So what am I missing? Please show a case of "list with repeated
items" where the fixed quicksort "doesn't work correctly", thanks.

btw, my favourite fixed version is:

def quicksort(alist):
try: head = alist[0]
except IndexError: return alist
return quicksort([ x for x in alist[1:] if x<=head ]) + [
head ] + quicksort([ x for x in alist[1:] if x>head ])

but I do see that testing len(alist) is probably better.

How sweet it would be to be able to unpack by coding:
head, *tail = alist
....
Alex

Jul 18 '05 #7

David Eppstein

In article <xd**********************@news2.tin.it>,
Alex Martelli <al***@aleax.it> wrote:

David Eppstein wrote:
...
def quicksort(alist):
head = alist[0]
return quicksort([ x in alist[1:] if x<=head ]) + [
head ] + quicksort([ x in alist[1:] if x>head ])

Well, except that it gets a syntax error. And if you correct the syntax
it gets an IndexError due to the lack of a base case for the recursion.

Right on both counts -- sorry.
And if you correct that, it still doesn't work correctly for lists with
repeated items.

Why not? Having fixed the syntax and added the base-case, I see:

Sorry, does work correctly, just not efficiently. I didn't see the = in
the first recursive call. But if you call your quicksort on say [0]*n,
each call will split the list as unevenly as possible and you get
quadratic runtime. Of course we know that nonrandom quicksort pivoting
can be quadratic anyway (e.g. on sorted input) but this to my mind is
worse because randomization doesn't make it any better. The fact that
some textbooks (e.g. CLRS!) make this mistake doesn't excuse it either.

--
David Eppstein http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science

Jul 18 '05 #8

Damien Wyart

* David Eppstein <ep******@ics.uci.edu> in comp.lang.python:

Of course we know that nonrandom quicksort pivoting can be quadratic
anyway (e.g. on sorted input) but this to my mind is worse because
randomization doesn't make it any better. The fact that some textbooks
(e.g. CLRS!) make this mistake doesn't excuse it either.

Could you explain in more detail the error made in CLRS on this topic
(with a reference if possible) ? I did not precisely catch your
explanation here.

Thanks in advance,
--
Damien Wyart

Jul 18 '05 #9

Michael Hudson

Gerrit Holl <ge****@nl.linux.org> writes:

Dave Benjamin wrote:
In article <7h*************@pc150.maths.bris.ac.uk>, Michael Hudson wrote:
Dave Benjamin <ra***@lackingtalent.com> writes:

> Anyone have any other interesting applications of the extended slice syntax?

You can use it in a sieve prime finder (ooh, thread crossing :-):
...
but it's not especially efficient (in speed or memory). Quite cute,
though.

I do not seem to have received this message from Michael Hudson through
python-list. I am able to locate it on the newsgroup. Does anyone else
have this problem?

Hmm, I saw it :-)

Google also has it:

http://groups.google.com/groups?selm...ths.bris.ac.uk

I don't really read the newsgroup closely enough any more to know if
any messages are going missing, I'm afraid.

Cheers,
mwh
--
Screaming 14-year-old boys attempting to prove to each other that
they are more 3133t than j00.
-- Reason #8 for quitting slashdot today, from
http://www.cs.washington.edu/homes/k.../slashdot.html

Jul 18 '05 #10

David Eppstein

In article <3f***********************@news.free.fr>,
Damien Wyart <da**********@free.fr> wrote:

* David Eppstein <ep******@ics.uci.edu> in comp.lang.python:
Of course we know that nonrandom quicksort pivoting can be quadratic
anyway (e.g. on sorted input) but this to my mind is worse because
randomization doesn't make it any better. The fact that some textbooks
(e.g. CLRS!) make this mistake doesn't excuse it either.

Could you explain in more detail the error made in CLRS on this topic
(with a reference if possible) ? I did not precisely catch your
explanation here.

Thanks in advance,

CLRS' partition routine ends up partitioning an n-item array into the
items <= the pivot, the pivot itself, and the items > the pivot.
If the items are all equal, that means that the first part gets n-1
items and the last part gets nothing, regardless of which item you
select as pivot. If you analyze the algorithm on this input, you get
the recurrence T(n)=O(n)+T(n-1)+T(0) which solves to O(n^2).

Throughout most of the quicksort chapter, CLRS at least include
exercises mentioning the case of all equal inputs, asking what happens
for those inputs, and in one exercise suggesting a partition routine
that might partition those inputs more equally (but who knows what it
should do when merely most of the inputs are equal...) However in the
randomized quicksort section they ignored this complication and seemed
to be claiming that their randomized quicksort has O(n log n) expected
time, unconditionally.

This problem was finally corrected in the fourth printing of the second
edition; see the errata at
<http://www.cs.dartmouth.edu/~thc/clrs-2e-bugs/bugs.php>
for more details.

--
David Eppstein http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science

Jul 18 '05 #11

Greg Ewing (using news.cis.dfn.de)

Alex Martelli wrote:

How sweet it would be to be able to unpack by coding:
head, *tail = alist

Indeed! I came across another use case for this
recently, as well. I was trying to parse some
command strings of the form

command arg arg ...

where some commands had args and some didn't.
I wanted to split the command off the front
and keep the args for later processing once
I'd decoded the command. My first attempt
went something like

command, args = cmdstring.split(" ", maxsplit = 1)

but this fails when there are no arguments,
because the returned list has only one element
in that case.

It would have been very nice to be able to
simply say

command, *args = cmdstring.split()

and get the command as a string and a
list of 0 or more argument strings.

I really ought to write a PEP about this...

--
Greg Ewing, Computer Science Dept,
University of Canterbury,
Christchurch, New Zealand
http://www.cosc.canterbury.ac.nz/~greg

Jul 18 '05 #12

David C. Fox

Greg Ewing (using news.cis.dfn.de) wrote:

Alex Martelli wrote:
How sweet it would be to be able to unpack by coding:
head, *tail = alist

Indeed! I came across another use case for this
recently, as well. I was trying to parse some
command strings of the form

command arg arg ...

where some commands had args and some didn't.
I wanted to split the command off the front
and keep the args for later processing once
I'd decoded the command. My first attempt
went something like

command, args = cmdstring.split(" ", maxsplit = 1)

but this fails when there are no arguments,
because the returned list has only one element
in that case.

It would have been very nice to be able to
simply say

command, *args = cmdstring.split()

and get the command as a string and a
list of 0 or more argument strings.

I really ought to write a PEP about this...

In the mean time, it isn't too hard to write a function which does this:

def first_rest(x):
return x[0], x[1:]

command, args = first_rest(cmdstring.split())

or, in one step

def chop_word(s):
return first_rest(s.split())

command, args = chop_word(cmdstring)

David

Jul 18 '05 #13

Lulu of the Lotus-Eaters

|Alex Martelli wrote:
|> How sweet it would be to be able to unpack by coding:
|> head, *tail = alist

"Greg Ewing (using news.cis.dfn.de)" <g2********@sneakemail.com> wrote previously:
| command arg arg ...
| command, *args = cmdstring.split()
|and get the command as a string and a list of 0 or more argument strings.

I think I've written on this list before that I would like this
too--Ruby does this, IIRC. If anyone writes a PEP, you have my +1 in
advance.

For Greg's particular case, one approach that doesn't look too bad is:

command = args.pop(0)
# ... do stuff ...
try:
more = args.pop(0)
except IndexError:
# no more

For a really long list, it would be faster to do an 'args.reverse()' up
front, and '.pop()' off the end (Greg and Alex know all this, of course).

Yours, Lulu...

--
mertz@ _/_/_/_/ THIS MESSAGE WAS BROUGHT TO YOU BY: \_\_\_\_ n o
gnosis _/_/ Postmodern Enterprises \_\_
..cx _/_/ \_\_ d o
_/_/_/ IN A WORLD W/O WALLS, THERE WOULD BE NO GATES \_\_\_ z e

Jul 18 '05 #14

David Eppstein

In article <Jd7fb.484435$cF.170532@rwcrnsc53>,
"David C. Fox" <da*******@post.harvard.edu> wrote:

In the mean time, it isn't too hard to write a function which does this:

def first_rest(x):
return x[0], x[1:]

Shouldn't that be called cons?

--
David Eppstein http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science

Jul 18 '05 #15

Alex Martelli

David Eppstein wrote:

In the mean time, it isn't too hard to write a function which does this:

def first_rest(x):
return x[0], x[1:]

Shouldn't that be called cons?

Hmmm, feels more like the 'reverse' of a cons to me -- takes a list
and gives me the car and cdr...
Alex

Jul 18 '05 #16

Alex Martelli

Lulu of the Lotus-Eaters wrote:
...

|> head, *tail = alist

"Greg Ewing (using news.cis.dfn.de)" <g2********@sneakemail.com> wrote
previously:
| command arg arg ...
| command, *args = cmdstring.split()
|and get the command as a string and a list of 0 or more argument strings.

I think I've written on this list before that I would like this
too--Ruby does this, IIRC. If anyone writes a PEP, you have my +1 in
advance.

For Greg's particular case, one approach that doesn't look too bad is:

command = args.pop(0)
# ... do stuff ...
try:
more = args.pop(0)
except IndexError:
# no more
I'm not sure what this 'more' is about, actually. Greg's case is
currently best solved [IMHO] with
args = cmdstring.split()
command = args.pop(0)
For a really long list, it would be faster to do an 'args.reverse()' up
front, and '.pop()' off the end (Greg and Alex know all this, of course).

Actually, I don't -- both args.reverse() and args.pop(0) are O(len(args)),
so I don't know their relative speeds offhand. Interestingly enough,
timeit.py, for once, doesn't want to let me know about them either...:

[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)'
'x=ags.pop(0)'
Traceback (most recent call last):
File "timeit.py", line 249, in main
x = t.timeit(number)
File "timeit.py", line 158, in timeit
return self.inner(it, self.timer)
File "<timeit-src>", line 6, in inner
x=ags.pop(0)
IndexError: pop from empty list
[alex@lancelot python2.3]$

....since each .pop is modifying the ags list, the once-only setup
statement doesn't suffice... OK, so we need to repeat (and thus,
alas, intrinsically measure) the list copying too (sigh):

[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)'
'x=ags[:].pop(0)'
10000 loops, best of 3: 24.5 usec per loop

[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)'
'ags.reverse(); x=ags[:].pop()'
10000 loops, best of 3: 23.2 usec per loop

the results of the two snippets aren't identical, but that should
not affect the timing measured (as the use of ags[:] and the
repetition are time-measurement artefacts anyway). So, it does
look like reversing first IS faster -- by a hair. Trying to
remove the ags[:] part / overhead gave me a shock, though...:

[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)' 'x=ags[:]'
10000 loops, best of 3: 35.2 usec per loop

So -- it's clear to me that I do NOT understand what's going on
here. If just the ags[:] is 35.2 usec, how can the ags[:].pop(0)
be 24.5 ... ???

Tim...? Please HELP a fellow bot in need...!!!
Alex

Jul 18 '05 #17

Michael Hudson

Alex Martelli <al***@aleax.it> writes:

So -- it's clear to me that I do NOT understand what's going on
here. If just the ags[:] is 35.2 usec, how can the ags[:].pop(0)
be 24.5 ... ???

How quiet was your machine when you ran the tests? I see behaviour
more in line with what you'd expect:

[mwh@pc150 build]$ ./python ../Lib/timeit.py -s'ags=range(1000)' 'x=ags[:]'
10000 loops, best of 3: 31.6 usec per loop
[mwh@pc150 build]$ ./python ../Lib/timeit.py -s'ags=range(1000)' 'x=ags[:].pop(0)'
10000 loops, best of 3: 39.8 usec per loop

Cheers,
mwh

--
About the use of language: it is impossible to sharpen a
pencil with a blunt axe. It is equally vain to try to do
it with ten blunt axes instead.
-- E.W.Dijkstra, 18th June 1975. Perl did not exist at the time.

Jul 18 '05 #18

David Eppstein

In article <22**********************@news1.tin.it>,
Alex Martelli <al***@aleax.it> wrote:

David Eppstein wrote:
In the mean time, it isn't too hard to write a function which does this:

def first_rest(x):
return x[0], x[1:]

Shouldn't that be called cons?

Hmmm, feels more like the 'reverse' of a cons to me -- takes a list
and gives me the car and cdr...

Well, but it returns an object composed of a car and a cdr, which is
exactly what a cons is...

--
David Eppstein http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science

Jul 18 '05 #19

Alex Martelli

Michael Hudson wrote:

Alex Martelli <al***@aleax.it> writes:
So -- it's clear to me that I do NOT understand what's going on
here. If just the ags[:] is 35.2 usec, how can the ags[:].pop(0)
be 24.5 ... ???
How quiet was your machine when you ran the tests? I see behaviour

Very, and the numbers were highly repeatable:

-- running just now:

[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)' 'x=ags[:]'
10000 loops, best of 3: 36.9 usec per loop
[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)' 'x=ags[:]'
10000 loops, best of 3: 35.2 usec per loop
[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)' 'x=ags[:]'
10000 loops, best of 3: 35.8 usec per loop
[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)' 'x=ags[:]'
10000 loops, best of 3: 35.6 usec per loop

-- and from a copy & paste of the screen as of a while ago:

[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)'
'x=ags[:].pop(0)'
10000 loops, best of 3: 24.5 usec per loop
[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)'
'x=ags[:].pop(0)'
10000 loops, best of 3: 24.5 usec per loop

-- _however_ -- retrying the latter now:

[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)'
'x=ags[:].pop(0)'
10000 loops, best of 3: 48.8 usec per loop
[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)'
'x=ags[:].pop(0)'
10000 loops, best of 3: 46.1 usec per loop
[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)'
'x=ags[:].pop(0)'
10000 loops, best of 3: 46.5 usec per loop
[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)'
'x=ags[:].pop(0)'
10000 loops, best of 3: 24.5 usec per loop
[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)'
'x=ags[:].pop(0)'
10000 loops, best of 3: 24.4 usec per loop
[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)'
'x=ags[:].pop(0)'
10000 loops, best of 3: 24.5 usec per loop

-- _SO_ -- immediately going back to the just-copy tests...:

[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)' 'x=ags[:]'
100000 loops, best of 3: 19.3 usec per loop
[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)' 'x=ags[:]'
100000 loops, best of 3: 19.3 usec per loop
[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)' 'x=ags[:]'
100000 loops, best of 3: 19.3 usec per loop
SO -- so much for the "quiet"... clearly there _IS_ something periodically
running and distorting the elapsed-time measurements (which are the default
here on Linux) by a hefty factor of 2. So much for this kind of casual
benchmarking...!-) I guess I've been doing a little too much of it...

more in line with what you'd expect:

[mwh@pc150 build]$ ./python ../Lib/timeit.py -s'ags=range(1000)'
['x=ags[:]'
10000 loops, best of 3: 31.6 usec per loop
[mwh@pc150 build]$ ./python ../Lib/timeit.py -s'ags=range(1000)'
['x=ags[:].pop(0)'
10000 loops, best of 3: 39.8 usec per loop

Yep, makes more sense. So, moving to the more-stable -c (CPU time,
as given by time.clock):

[alex@lancelot python2.3]$ python timeit.py -c -s'ags=range(1000)'
'x=ags[:]'
100000 loops, best of 3: 19.2 usec per loop
[alex@lancelot python2.3]$ python timeit.py -c -s'ags=range(1000)'
'x=ags[:]'
100000 loops, best of 3: 19.2 usec per loop
[alex@lancelot python2.3]$ python timeit.py -c -s'ags=range(1000)'
'x=ags[:]'
100000 loops, best of 3: 19.2 usec per loop
[alex@lancelot python2.3]$ python timeit.py -c -s'ags=range(1000)'
'x=ags[:].pop(0)'
10000 loops, best of 3: 24 usec per loop
[alex@lancelot python2.3]$ python timeit.py -c -s'ags=range(1000)'
'x=ags[:].pop(0)'
10000 loops, best of 3: 24 usec per loop
[alex@lancelot python2.3]$ python timeit.py -c -s'ags=range(1000)'
'x=ags[:].pop(0)'
10000 loops, best of 3: 24 usec per loop
[alex@lancelot python2.3]$ python timeit.py -c -s'ags=range(1000)'
'x=ags[:].pop(0)'
10000 loops, best of 3: 23 usec per loop
[alex@lancelot python2.3]$ python timeit.py -c -s'ags=range(1000)'
'ags.reverse(); x=ags[:].pop()'
10000 loops, best of 3: 23 usec per loop
[alex@lancelot python2.3]$ python timeit.py -c -s'ags=range(1000)'
'ags.reverse(); x=ags[:].pop()'
10000 loops, best of 3: 22 usec per loop
[alex@lancelot python2.3]$ python timeit.py -c -s'ags=range(1000)'
'ags.reverse(); x=ags[:].pop()'
10000 loops, best of 3: 23 usec per loop
[alex@lancelot python2.3]$ python timeit.py -c -s'ags=range(1000)'
'ags.reverse(); x=ags[:].pop()'
10000 loops, best of 3: 23 usec per loop

it WOULD seem to be confirmed that reverse-then-pop() is VERY
slightly faster than pop(0) -- 22/23 instead of 23/24 usec for
a 1000-long list... of which 19.2 are the apparently-repeatable
overhead of copying that list. I still wouldn't say that I
"KNOW" this, though -- the margin is SO tiny and uncertain...!!!

It seems to scale up linearly going from 1000 to 5000: just
the copy, 105-108; ags[:].pop(0), 120-123; reverse then pop,
116-117. This is always with the -c (once burned...).
Alex

Jul 18 '05 #20

Alex Martelli

David Eppstein wrote:

In article <22**********************@news1.tin.it>,
Alex Martelli <al***@aleax.it> wrote:
David Eppstein wrote:
>> In the mean time, it isn't too hard to write a function which does
>> this:
>>
>> def first_rest(x):
>> return x[0], x[1:]
>
> Shouldn't that be called cons?

Hmmm, feels more like the 'reverse' of a cons to me -- takes a list
and gives me the car and cdr...

Well, but it returns an object composed of a car and a cdr, which is
exactly what a cons is...

Yeah, the psychological issue is no doubt with calling 'list' that
argument -- cons takes TWO separate arguments (and the SECOND one
is normally a list), while this is being discussed in the context
of UN-packing, AND takes a single argument to boot. While of course
it's true that "return this, that" returns a tuple, thinking of it
in terms of "returning multiple objects" is most natural when you're
planning to unpack that tuple before it's had time to look around...;-).
Alex

Jul 18 '05 #21

David Mertz

Alex Martelli <al***@aleax.it> wrote previously:
|I'm not sure what this 'more' is about, actually. Greg's case is
|currently best solved [IMHO] with
| args = cmdstring.split()
| command = args.pop(0)

Part of his description was "I might or might not have more in the
list." My 'more' stuff was just illustrating handling that extra stuff.
Admittedly, that's not directly part of what 'car,*cdr=lst' solves.

|Actually, I don't -- both args.reverse() and args.pop(0) are
|O(len(args))

Hmmm... I am pretty sure I have found Alex in an outright mistake. A
shocking thing, if so :-).

|[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)'
|'x=ags[:].pop(0)'
|10000 loops, best of 3: 24.5 usec per loop
|[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)'
|'ags.reverse(); x=ags[:].pop()'
|10000 loops, best of 3: 23.2 usec per loop

The second operation does a '.reverse()' for every '.pop()', which is
both the wrong behavior, and not the relevant thing to time.

from time import clock
def reverse_then_pop(l): ... start = clock()
... l.reverse()
... while 1:
... try: l.pop()
... except: break
... print "%.2f seconds" % (clock()-start)
... def pop_from_left(l): ... start = clock()
... while 1:
... try: l.pop(0)
... except: break
... print "%.2f seconds" % (clock()-start)
... reverse_then_pop(range(10000)) 0.03 seconds pop_from_left(range(10000)) 0.39 seconds reverse_then_pop(range(20000)) 0.06 seconds pop_from_left(range(20000)) 1.47 seconds reverse_then_pop(range(50000)) 0.14 seconds pop_from_left(range(50000)) 11.73 seconds reverse_then_pop(range(100000)) 0.29 seconds pop_from_left(range(100000)) 140.77 seconds reverse_then_pop(range(500000)) 1.41 seconds pop_from_left(range(500000))

...abandoned after a few hours :-)...

Yours, David...

--
Keeping medicines from the bloodstreams of the sick; food from the bellies
of the hungry; books from the hands of the uneducated; technology from the
underdeveloped; and putting advocates of freedom in prisons. Intellectual
property is to the 21st century what the slave trade was to the 16th.

Jul 18 '05 #22

Fernando Perez

Alex Martelli wrote:

How sweet it would be to be able to unpack by coding:
head, *tail = alist

+1000 :)

I'd LOVE to have this syntax around. I'd even want:

head, *body, last = alist

and

*head_and_body, last = alist

to both work in their respectively obvious ways ;) (left-to-right unpacking,
empty lists/tuples returned for elements which can't be unpacked --like if
alist contains just one element, and the lhs returning tuple/list based on the
type of the rhs).

We can dream :)

Cheers,

f

Jul 18 '05 #23

Alex Martelli

David Mertz wrote:

Alex Martelli <al***@aleax.it> wrote previously:
|I'm not sure what this 'more' is about, actually. Greg's case is
|currently best solved [IMHO] with
| args = cmdstring.split()
| command = args.pop(0)

Part of his description was "I might or might not have more in the
list." My 'more' stuff was just illustrating handling that extra stuff.
Admittedly, that's not directly part of what 'car,*cdr=lst' solves.
Right, and that's what we were discussing.

|Actually, I don't -- both args.reverse() and args.pop(0) are
|O(len(args))

Hmmm... I am pretty sure I have found Alex in an outright mistake. A
shocking thing, if so :-).
Nope. We're discussing situations where ONE pop from the start is
needed; and you claimed I knew that in that case reversing the list
and popping from the end was faster.
|[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)'
|'x=ags[:].pop(0)'
|10000 loops, best of 3: 24.5 usec per loop
|[alex@lancelot python2.3]$ python timeit.py -s'ags=range(1000)'
|'ags.reverse(); x=ags[:].pop()'
|10000 loops, best of 3: 23.2 usec per loop

The second operation does a '.reverse()' for every '.pop()', which is
both the wrong behavior, and not the relevant thing to time.

It's exactly the right behavior and exactly the relevant thing. Note
the [:] which I discussed in my post -- I'm not interested in popping
all items one after the other (which would have absolutely nothing to do
with 'car, *cdr = thelist'), I'm interested in popping ONE item -0- the
first one. So, the comparison is between one pop(0) and one
reverse() then one pop() [of course, not with a [:] -- except for
the need to cooperate with timeit.py's measurements in what would
otherwise be a 'destructive' operation and thus unmeasurable by
it]. (My error, as Michael helped me see, was in not using -c or
making my machine perfectly quiescent -- I _thought_ it was but a
little experimentation soon showed it was anything but; anyway,
even _with_ better measurement, reverse-then-pop does turn out
to be faster -- by a miniscule margin, on this release and on this box --
than pop(0) -- I'm not quite sure I "know" it, yet, though...).
Alex

Jul 18 '05 #24

Alex Martelli

Fernando Perez wrote:

Alex Martelli wrote:
How sweet it would be to be able to unpack by coding:
head, *tail = alist
+1000 :)

I'd LOVE to have this syntax around. I'd even want:

head, *body, last = alist

and

*head_and_body, last = alist

to both work in their respectively obvious ways ;) (left-to-right
unpacking, empty lists/tuples returned for elements which can't be

So far so good...
unpacked --like if alist contains just one element, and the lhs returning
tuple/list based on the type of the rhs).

_blink_ uh, what...?

a,b,c='lop'
x=array.array('c','lop')
c,b,a=x
a,b,c=xrange(3)
def tt(): .... yield '1'; yield 22; yield 33
.... a,b,c=tt()

....so WHAT type would you want for the lhs *andalltherest term in each
of these cases...??? I'd be _particularly_ curious about the last one...

In other words, given the rhs in unpacking can be ANY iterable, including,
for example, a generator, it's (IMHO) absurd to try to have the *blah arg
on the lhs 'take on that type'. Doing it for some special kinds of sequence
and dumping all the others into [e.g.] list or tuple, sort of like filter
does, is a serious mistake again IMHO -- try teaching *that* a few times.

I would want the type of the rhs to be irrelevant, just as long as it's any
iterable whatsoever, and bundle the 'excess items' into ONE specified
kind of sequence. I also think it would be best for that one kind to be
tuple, by analogy to argument passing: you can use any iterable as the
actual argument 'expanded' by the *foo form, but whatever type foo
is, if the formal argument is *bar then in the function bar is a TUPLE --
period, simplest, no ifs, no buts. Maybe using a list instead might have
been better, but tuple was chosen, and I think we should stick to that
for unpacking, because the analogy with argument passing is a good
part of the reason the *foo syntax appeals to me (maybe for the same
reason we should at least for now forego the *foo appearing in any
spot but the last... though that _would_ be a real real pity...).

Oh, and formats for module struct -- THERE, too, allowing a star
would sometimes be very VERY useful and handy, perhaps even
more than in unpacking-assignment...
Alex

Jul 18 '05 #25

David Mertz

Alex Martelli <al***@aleax.it> wrote previously:
|Nope. We're discussing situations where ONE pop from the start is
|needed; and you claimed I knew that in that case reversing the list
|and popping from the end was faster.

Nah. I believe the context of my first note in the thread made the
imagined situation obvious. The situation Greg described was where you
want to look (from the left) at one item at a time, then decide
whether/when to grab the next based on that one.

That one-at-a-time-but-many-times seems to me the case where
'car,*cdr=thelist' is most useful. And for that, one-reverse-
plus-many-pops is the best current approach.

I quite concur with Alex that a '.reverse()' is not particularly
worthwhile to do just one '.pop()' (I guess it turned out to save a
couple milliseconds, but nothing important). The hermeneutics of the
thread aren't worth getting hung up on.

Of course, in Greg's case of parsing a dozen command-line switches, the
speed issue is not interesting. The whole '.reverse()' thing only
matters when you need to handle thousands of items (and *that*, I think,
Alex will admit he knows :-)).

Yours, David...

--
mertz@ _/_/_/_/ THIS MESSAGE WAS BROUGHT TO YOU BY: \_\_\_\_ n o
gnosis _/_/ Postmodern Enterprises \_\_
..cx _/_/ \_\_ d o
_/_/_/ IN A WORLD W/O WALLS, THERE WOULD BE NO GATES \_\_\_ z e

Jul 18 '05 #26

Bengt Richter

On Fri, 03 Oct 2003 15:53:34 -0600, Fernando Perez <fp*******@yahoo.com> wrote:

Alex Martelli wrote:
How sweet it would be to be able to unpack by coding:
head, *tail = alist

+1000 :)

I'd LOVE to have this syntax around. I'd even want:

head, *body, last = alist

and

*head_and_body, last = alist

to both work in their respectively obvious ways ;) (left-to-right unpacking,
empty lists/tuples returned for elements which can't be unpacked --like if
alist contains just one element, and the lhs returning tuple/list based on the
type of the rhs).

We can dream :)

Maybe spelling it a little differently wouldn't be too bad?
(tested only as you see ;-):

def headtail(seq): it=iter(seq); yield it.next(); yield list(it) ... h,t = headtail(range(5))
h 0 t [1, 2, 3, 4] h,t (0, [1, 2, 3, 4])

Or from a generator
def g(): ... for i in range(0,100,10): yield i
... h,t = headtail(g())
h,t (0, [10, 20, 30, 40, 50, 60, 70, 80, 90])

Not so useful:
h,t = headtail('abcd')
h,t ('a', ['b', 'c', 'd'])

Changes to list:
h,t = headtail( ('abc','123',1,2,3))
h 'abc' t ['123', 1, 2, 3]

So, to preserve common seq types, you could do a type-preserving-headtail
def tpheadtail(seq): ... if isinstance(seq, str): yield seq[0]; yield seq[1:]; return
... it = iter(seq); yield it.next()
... if isinstance(seq, tuple): yield tuple(it)
... else: yield list(it)
... h,t = tpheadtail(range(5))
h,t (0, [1, 2, 3, 4]) h,t = tpheadtail(tuple(range(5)))
h,t (0, (1, 2, 3, 4)) h,t = tpheadtail('abcdef')
h,t

('a', 'bcdef')

You could control repacking ad libitum by adding a packing-format-control string parameter,
say '.' for one element, a decimal>1 for n elements in a tuple or list, one '*' for as many
elements as exist in the context, no prefix to preserve seq type, prefix T to make tuple,
L to make list. Hence the default headtail format would be '.*' -- some example spellings:

h,t = repack(seq) # head, *tail
h,t = repack(seq, '*.') # *head, tail
h,m,t = repack(seq, '.*.') # head, *middle, last
e1,e2,e3,tup,em2,em1 = repack(seq, '...t*..') # force seq[3:-2] to tuple for tup
t5,myList123,restAsList = repack(seq, 'T5L123,L*')

This is a tangent from a prime number tangent from ... er, so the implementation of
the above will be left as an exercise ;-)

def repack(seq, fmt='.*'):
# ... no ;-)

Regards,
Bengt Richter

Jul 18 '05 #27

Alex Martelli

David Mertz wrote:

Alex Martelli <al***@aleax.it> wrote previously:
|Nope. We're discussing situations where ONE pop from the start is
|needed; and you claimed I knew that in that case reversing the list
|and popping from the end was faster.

Nah. I believe the context of my first note in the thread made the
imagined situation obvious. The situation Greg described was where you
want to look (from the left) at one item at a time, then decide
whether/when to grab the next based on that one.
He was describing a commandverb / args separation. _One_ commandverb,
zero or more args to go with it.
That one-at-a-time-but-many-times seems to me the case where
'car,*cdr=thelist' is most useful. And for that, one-reverse-
plus-many-pops is the best current approach.
If you have a mapping from command-verb to number of req arguments
(with all further optional args to be left as the sequence), I
think slicing is in fact clearer:

command_verb = args[0]
num_req_args = numargs.get(command_verb, 0)
requiredargs = args[1:1+num_req_args]
other_args = args[1+num_req_args:]

I don't think any unpacking or popping, with or without reversing
thrown in, can compete with this clarity.

In most other use cases where it might initially seems that the
best idea is "consuming" the sequence (and thus that reversing
might be a clever idea), it turns out that just *iterating* on
the sequence (or an iter(...) thereof, to keep iteration state)
is in fact quite preferable.

I quite concur with Alex that a '.reverse()' is not particularly
worthwhile to do just one '.pop()' (I guess it turned out to save a
couple milliseconds, but nothing important). The hermeneutics of the
_micro_seconds (for lists of several thousands of items).
thread aren't worth getting hung up on.
If you have no grounds to claim I made a mistake, then please don't
(and apologize if you think you have done so in error). If you do
think you can prove it, and I disagree, then it seems to me that
this disagreement IS "worth getting hung up on".

Of course, in Greg's case of parsing a dozen command-line switches, the
speed issue is not interesting. The whole '.reverse()' thing only
matters when you need to handle thousands of items (and *that*, I think,
Alex will admit he knows :-)).

I do know (having read Knuth) that micro-optimizations should be ignored
perhaps 97% of the time, and can well guess that this case doesn't look
anywhere close to the remaining "perhaps 3%". I.e., that (like most
debates on performance) this one has little substance or interest.

Much more interesting might be to give the list.reverse method optional
start and end parameters to reverse in-place a contiguous slice of a
list -- the current alternative:
aux = somelist[start:end]
aux.reverse()
somelist[start:end] = aux
is just too sucky, substantially diminishing the interest of the
reverse method. But that, of course, is a totally unrelated subject.
Alex

Jul 18 '05 #28

David Mertz

|> The hermeneutics of the thread aren't worth getting hung up on.

Alex Martelli <al***@aleax.it> wrote previously:
|If you have no grounds to claim I made a mistake, then please don't
|(and apologize if you think you have done so in error).

Well, OK... if you want to get hung up on hermeneutics, the sequence
was:

(0) Greg used the example of command-line processing to argue for the
addition of 'car,*cdr=thelist' to Python (which Alex and I both
like too).

(1) I made the claim (well-known to Alex, but perhaps not to newbies
who might be reading) that reverse-with-many-pops is (much) faster
than many-pops-from-left.

(2) Alex wrote: "No, you're wrong", and posted a comparison of one
reverse with one left pop.

(3) I observed that such was the wrong issue. I don't think I
insinuated it was a -morally- wrong issue, just not the one I
was interested in.

(4) Alex seemed WAY oversensitive about said observation.

(5) I suggested, in a concilliatory mood, that we not get hung up on
he-said/she-said (hermeneutics).

(6) Alex demands an apology.

(7) I post this chronology.

Hopefully, the next part is (8) Alex chills out.

All the best, David...

--
mertz@ _/_/_/_/_/_/_/ THIS MESSAGE WAS BROUGHT TO YOU BY:_/_/_/_/ v i
gnosis _/_/ Postmodern Enterprises _/_/ s r
..cx _/_/ MAKERS OF CHAOS.... _/_/ i u
_/_/_/_/_/ LOOK FOR IT IN A NEIGHBORHOOD NEAR YOU_/_/_/_/_/ g s

Jul 18 '05 #29

Alex Martelli

David Mertz wrote:

|> The hermeneutics of the thread aren't worth getting hung up on.

Alex Martelli <al***@aleax.it> wrote previously:
|If you have no grounds to claim I made a mistake, then please don't
|(and apologize if you think you have done so in error).

Well, OK... if you want to get hung up on hermeneutics, the sequence
was:

(0) Greg used the example of command-line processing to argue for the
addition of 'car,*cdr=thelist' to Python (which Alex and I both
like too).

(1) I made the claim (well-known to Alex, but perhaps not to newbies
who might be reading) that reverse-with-many-pops is (much) faster
than many-pops-from-left.
You did not clearly express at that time that you had changed the
subject (without changing the Subject: header) from the subject
of star-unpacking to the one of "*MANY* pops". Thus, I think I
was entirely justified in continuing to address the subject in
the header (and that your expression of your intentions left a
lot to be desired). As you indicate you do not want to continue
discussing this, I will not; but it seems this is the crucial
point of disagreement, which I believe made your assertion about
having "found Alex in an outright mistake" (which you chose to
post as a comment to my assertion that both args.reverse() and
args.pop(0) are O(len(reverse)) -- an assertion that is anything
but a mistake) apparently justified in your opinion, and utterly
unjustified in mine. I _do_ make "outright mistakes" (why, just
the other day I hastily posted a list comprehension as
[x in seq] instead of [x for x in seq]!!!), and I apologize when
it happens. Having carefully re-read and examined this thread,
I am entirely convinced that this is not one such case, and not
happy at all that, despite having IN THAT VERY SAME POST admitted
that you had murkily been "changing the subject and not the Subject:"
("Admittedly, that's not directly part of what" we were discussing
is what you said!), you STILL appear convinced that no apology is
warranted about that "outright mistake" observation.

Hopefully, the next part is (8) Alex chills out.

Not to worry: I'm more than chilly, I'm _ICY_ with people who claim
I am wrong and fail to back off when I am convinced that my evidence
to the contrary is entirely satisfactory. At least, I think the
reason for my iciness has been made perfectly clear -- although I'm
still in the dark about why you still think your "outright mistake"
assertion was correct, I won't lose any sleep over that.
Alex

Jul 18 '05 #30

Ian McMeans

I've got to drop in my two cents :)
I wanted to see how clean I could make the algorithms look.

def quicksort(lst):
if len(lst) <= 1: return lst

left = [x for x in lst if x < lst[0]]
middle = [x for x in lst if x == lst[0]]
right = [x for x in lst if x > lst[0]]

return quicksort(left) + middle + quicksort(right)

quicksort([random.randint(0, 20) for dummy in range(20)])

[0, 1, 2, 4, 4, 4, 4, 4, 6, 6, 8, 9, 12, 12, 13, 14, 14, 14, 16, 16]

Hopefully this is still nlogn :)

and then:

def primes():
p = {}
for cand in itertools.count(2):
if not [True for factor in p if not cand % factor]:
p[cand] = None
yield cand

list(itertools.islice(primes(), 0, 20))
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71]

Jul 18 '05 #31

David Eppstein

In article <7f**************************@posting.google.com >,
im******@telus.net (Ian McMeans) wrote:

def quicksort(lst):
if len(lst) <= 1: return lst

left = [x for x in lst if x < lst[0]]
middle = [x for x in lst if x == lst[0]]
right = [x for x in lst if x > lst[0]]

return quicksort(left) + middle + quicksort(right)
quicksort([random.randint(0, 20) for dummy in range(20)])

[0, 1, 2, 4, 4, 4, 4, 4, 6, 6, 8, 9, 12, 12, 13, 14, 14, 14, 16, 16]

Hopefully this is still nlogn :)

Well, for random inputs it is. If you want it to be O(n log n) even for
sorted inputs you could change it a little:

def quicksort(lst):
if len(lst) <= 1: return lst
pivot = lst[random.randrange(len(lst))]

left = [x for x in lst if x < pivot]
middle = [x for x in lst if x == pivot]
right = [x for x in lst if x > pivot]

return quicksort(left) + middle + quicksort(right)

--
David Eppstein http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science

Jul 18 '05 #32

Fernando Perez

Alex Martelli wrote:

I would want the type of the rhs to be irrelevant, just as long as it's any
iterable whatsoever, and bundle the 'excess items' into ONE specified
kind of sequence. I also think it would be best for that one kind to be
tuple, by analogy to argument passing: you can use any iterable as the
actual argument 'expanded' by the *foo form, but whatever type foo
is, if the formal argument is *bar then in the function bar is a TUPLE --
period, simplest, no ifs, no buts. Maybe using a list instead might have
been better, but tuple was chosen, and I think we should stick to that
for unpacking, because the analogy with argument passing is a good
part of the reason the *foo syntax appeals to me (maybe for the same
reason we should at least for now forego the *foo appearing in any
spot but the last... though that _would_ be a real real pity...).

Agreed: a single convention (and following tuples is a good one, if nothing
else b/c it's the existing one) is probably the sanest solution. I hadn't
thought of generators and arbitrary iterables, partly b/c I still use pretty
much 'naked' python 2.1 for everything.

Cheers,

f.

Jul 18 '05 #33

Fun with fancy slicing

Similar topics