By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,835 Members | 1,454 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,835 IT Pros & Developers. It's quick & easy.

Creating a List of Empty Lists

P: n/a
Pythons internal 'pointers' system is certainly causing me a few
headaches..... When I want to copy the contents of a variable I find
it impossible to know whether I've copied the contents *or* just
created a new pointer to the original value....

For example I wanted to initialize a list of empty lists....

a=[ [], [], [], [], [] ]

I thought there has to be a *really* easy way of doing it - after a
bit of hacking I discovered that :
a = [[]]*10 appeared to work

however - using it in my program called bizarre crashes....
I eventually discovered that (as a silly example) :
a = [[]]*10
b=-1
while b < 10:
b += 1
a[b] = b
print a

Produced :
[ [9], [9], [9]......

Which isn't at all what I intended...............
What is the correct, quick way of doing this (without using a loop and
appending...) ?

Fuzzyman

http://www.Voidspace.org.uk
The Place where headspace meets cyberspace. Online resource site -
covering science, technology, computing, cyberpunk, psychology,
spirituality, fiction and more.

---
http://www.atlantibots.org.uk
http://groups.yahoo.com/group/atlantis_talk/
Atlantibots - stomping across the worlds of Atlantis.
---
http://www.fuchsiashockz.co.uk
http://groups.yahoo.com/group/void-shockz
---

Everyone has talent. What is rare is the courage to follow talent
to the dark place where it leads. -Erica Jong
Ambition is a poor excuse for not having sense enough to be lazy.
-Milan Kundera
Jul 18 '05 #1
Share this Question
Share on Google+
23 Replies


P: n/a
mi*****@foord.net (Fuzzyman) wrote in
news:80**************************@posting.google.c om:
I eventually discovered that (as a silly example) :
a = [[]]*10
b=-1
while b < 10:
b += 1
a[b] = b
print a

Produced :
[ [9], [9], [9]......

Which isn't at all what I intended...............
What is the correct, quick way of doing this (without using a loop and
appending...) ?


The recommended way these days is usually:

a = [ [] for i in range(10) ]

That still has a loop and works by appending empty lists, but at least its
just a single expression. Also you can incorporate the next stage of your
initialisation quite easily:

a = [ [b] for b in range(10) ]

--
Duncan Booth du****@rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?
Jul 18 '05 #2

P: n/a
Fuzzyman wrote:
Pythons internal 'pointers' system is certainly causing me a few
headaches..... When I want to copy the contents of a variable I find
it impossible to know whether I've copied the contents *or* just
created a new pointer to the original value....

For example I wanted to initialize a list of empty lists....

a=[ [], [], [], [], [] ] [...] What is the correct, quick way of doing this (without using a loop and
appending...) ?


l = [ [] for i in xrange (3)]
l [[], [], []] l [0].append ('a')
l

[['a'], [], []]

Daniel

Jul 18 '05 #3

P: n/a
mi*****@foord.net (Fuzzyman) wrote:
Pythons internal 'pointers' system is certainly causing me a few
headaches..... When I want to copy the contents of a variable I find
it impossible to know whether I've copied the contents *or* just
created a new pointer to the original value....

For example I wanted to initialize a list of empty lists....

a=[ [], [], [], [], [] ]

I thought there has to be a *really* easy way of doing it - after a
bit of hacking I discovered that :
a = [[]]*10 appeared to work

however - using it in my program called bizarre crashes....
I eventually discovered that (as a silly example) :
a = [[]]*10
b=-1
while b < 10:
b += 1
a[b] = b
print a

Produced :
[ [9], [9], [9]......

Which isn't at all what I intended...............
What is the correct, quick way of doing this (without using a loop and
appending...) ?


Here it produced an IndexError. After changing "b < 10" into "b < 9"
the code produced:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

I see some other posters already gave you the answer. I'll do
something different and give you the question :-)

n = 4
agents = [[]]*n
print agents
agents[0].append('Smith')
print agents
neos = map(list,[[]]*n)
print neos
neos[0].append('Neo')
print neos

output is:

[[], [], [], []]
[['Smith'], ['Smith'], ['Smith'], ['Smith']]
[[], [], [], []]
[['Neo'], [], [], []]

The question is:

Why is "Smith" copied to all elements in the matrix?

(or is that another movie :-)

Anton

Jul 18 '05 #4

P: n/a
>>>>> "Anton" == Anton Vredegoor <an***@vredegoor.doge.nl> writes:

Anton> Why is "Smith" copied to all elements in the matrix?

:)

The construct

[[]] * n

gives you a list with n references to the same list. When you modify
one of the elements, all the references will see the changes.

You can use one of the three (there are more)

([[]]*n)[:]
[[] for _ in range(n)]
map (lambda _: [], range(n))

to get n different copies of [].

Sam
--
Samuel Tardieu -- sa*@rfc1149.net -- http://www.rfc1149.net/sam
Jul 18 '05 #5

P: n/a
Samuel Tardieu wrote:
You can use one of the three (there are more)

([[]]*n)[:]


This won't work. [:] makes a shallow copy. This will make a different
containing list, but the contained lists will still be identical:
s = [[]]*10
s = [[]]*5
t = s[:]
id(s) 1076779980 id(t) 1076780300 map(lambda (x, y): x is y, zip(s, t)) [True, True, True, True, True] s[0].append(2)
s [[2], [2], [2], [2], [2]] t

[[2], [2], [2], [2], [2]]

--
Erik Max Francis && ma*@alcyone.com && http://www.alcyone.com/max/
__ San Jose, CA, USA && 37 20 N 121 53 W && &tSftDotIotE
/ \
\__/ I'm sharing the joy / I'm glowing like sunshine
-- Chante Moore
Jul 18 '05 #6

P: n/a
Fuzzyman wrote in message
<80**************************@posting.google.com>. ..
Pythons internal 'pointers' system is certainly causing me a few
headaches..... When I want to copy the contents of a variable I find
it impossible to know whether I've copied the contents *or* just
created a new pointer to the original value....

You don't need to divine the rules for copy vs. reference: Python NEVER
copies values, ONLY dereferences names, unless you EXPLICITLY ask for a copy
(which Python-the-language doesn't have any special machinery for; you have
to use the copy module or figure out how to copy the object yourself.)

It would help if you stopped thinking in terms of variables and pointers (as
in C) and thought instead in terms of names and objects. Python doesn't
have variables in the C sense, where the variable name stands for its value
ONLY up to the compilation stage, at which time the variable names cease to
exist. Rather, in Python, a "variable" is a name which points to an object.
This behavior is implemented as a key:value pair in a dictionary (i.e.,
__dict__), where the key is a string holding the name and the value is the
object itself. A Python "pointer" is a sort of indexed name, like C arrays;
hence the square-bracket syntax. However, even though the name is strictly
speaking unnamed (i.e., there is no corresponding string object in the
namespace dictionary), yet it is still purely a referent to a real object,
and not a real object itself.

Another way to interpret "pointer" in a Pythonic framework is to say its a
weak-reference: i.e., a reference to an object which does not increase that
object's reference count. However, there is no clear correspondence between
C's "variable/pointer" concepts and what Python does, only analogy.

When no names point to a given object, that object is a candidate for
garbage collection.

All these concepts are illustrated in the following two lines:
[[]]*2 [[], []]

This means:

- Create an empty list.
- Create a list which references that empty list.
- Create an integer '2'. (This step may be optimized away by pre-created
integer objects; pre-creating an object is called "interning" it--these
objects are immortal. You can make one with the intern() builtin.)
- Call the __mul__ method of the one-element list, with an argument of 2.
- The __mul__ method creates a new list, and inserts references to its own
elements into this new list--twice. References/names can *only* be copied
(by definition), not pointed to.
- __mul__ then returns the new list.
- This new list is not bound to any name, and so the object is impossible to
access. It is now a candidate for garbage collection.

So, in this one line, four objects were created and destroyed, not five.

The following behavior should now make sense:
L = [[]]*2
L [[], []] L[0].append(1)
L [[1], [1]]

L[0] and L[1] are two different names, but the object they both point to has
been modified. However:
L[0] = 1
L

[1, [1]]

Here, the name L[0] was made to point to a new object, the integer 1.

The only mutable objects you usually have to worry about are dicts and
lists.

--
Francis Avila

Jul 18 '05 #7

P: n/a
"Francis Avila" <fr***********@yahoo.com> wrote in message news:<vs************@corp.supernews.com>...
Fuzzyman wrote in message
<80**************************@posting.google.com>. ..
Pythons internal 'pointers' system is certainly causing me a few
headaches..... When I want to copy the contents of a variable I find
it impossible to know whether I've copied the contents *or* just
created a new pointer to the original value....

You don't need to divine the rules for copy vs. reference: Python NEVER
copies values, ONLY dereferences names, unless you EXPLICITLY ask for a copy
(which Python-the-language doesn't have any special machinery for; you have
to use the copy module or figure out how to copy the object yourself.)

[snip...] Interesting discussion reluctantly snipped.......
The only mutable objects you usually have to worry about are dicts and
lists.


Right - because if I do something like :

a = 4
b = a
a = 5
print b

It prints 4... not 5.
In other words - the line b = a creates a name pointing to the object
4, rather than a name pointing to the contents of a.....

I think I see what you mean - since the object 4 is immutable......
the line a = 5 destroys the old name a and creates a new one pointing
to object 5, rather than changing what the name a is pointing to.

Since lists and dictionaries are mutable..... changing the contents
modifies the object rather than destroying the refereence tothe old
one and creating a new one.....

Hmmmmmm... thanks.........

Fuzzy
Jul 18 '05 #8

P: n/a
Right - because if I do something like :

a = 4
b = a
a = 5
print b

It prints 4... not 5.
In other words - the line b = a creates a name pointing to the object
4, rather than a name pointing to the contents of a.....


There's no difference between "the object 4" and "the contents of a", so the
"rather than" makes no sense in this context.

After executing

b = a

the names "a" and "b" refer to the same object.
Jul 18 '05 #9

P: n/a
r.e.s. <r.*@XXmindspring.com> wrote:
"David M. Cooke" wrote ...
"r.e.s." wrote:
> But something's wrong with that explanation, because
> of the following:
>
> >>> a = 'gobble'
> >>> a is 'gobble'
> True
>
> Surely `'gobble'` is not the name of an already-existing
> object, so I expected exactly the same result as for
> `100` and `[]`. What's going on there?


Actually, your second use of 'gobble' *is* an already-existing object.
Python 'interns' string constants, i.e., reuses them.

<snip>

Thanks. That explains it.


But take note that this behavior is not guaranteed anywhere in the language
reference. As someone else said in this thread, don't count on interning.
Sometimes it will happen, and sometimes it won't:
a = 'gobble'
a is gobble True b = 'foo bar'
b is 'foo bar'

False

The rule of thumb is that Python interns a string if it's likely to be used
as a name (i.e., only alphanumerics and underscores). The string 'foo bar'
has a space and would be an invalid name, so it wasn't interned.

--
Robin Munn
rm***@pobox.com
Jul 18 '05 #10

P: n/a
In article <Xn**************************@127.0.0.1>, Duncan Booth
<du****@NOSPAMrcp.co.uk> writes
.......

The recommended way these days is usually:

a = [ [] for i in range(10) ]

That still has a loop and works by appending empty lists, but at least its
just a single expression. Also you can incorporate the next stage of your
initialisation quite easily:

a = [ [b] for b in range(10) ]

I seem to remember the fastest way to do this was map(list,n*[[]]) from
a couple of earlier threads, but is that true in 2.3?
--
Robin Becker
Jul 18 '05 #11

P: n/a
Robin Munn:
But take note that this behavior is not guaranteed anywhere in the language reference. As someone else said in this thread, don't count on interning. Sometimes it will happen, and sometimes it won't:
a = 'gobble'
a is gobble True b = 'foo bar'
b is 'foo bar' False

The rule of thumb is that Python interns a string if it's likely to be used as a name (i.e., only alphanumerics and underscores). The string 'foo bar' has a space and would be an invalid name, so it wasn't interned.


And don't think you can get around it using intern():

Python 2.3 (#46, Jul 29 2003, 18:54:32) [MSC v.1200 32 bit (Intel)] on
win32
b = intern('foo bar')
a = 'foo bar'
a is 'foo bar' False b is 'foo bar' False

That apparent space requirement should really be better documented.
b = intern('foo_bar')
b is 'foo_bar' True a = 'foo_bar'
a is 'foo_bar'

True

From the docs on intern():
"Interning strings is useful to gain a little performance on
dictionary lookup "

and while it continues:
"...names used in Python programs are automatically interned, and
the dictionaries used to hold module, class or instance attributes
have interned keys"

It probably should specifically state that only valid identifiers will
be intern()'d and optionally return an error or warning otherwise.

Emile van Sebille
em***@fenx.com

Jul 18 '05 #12

P: n/a

Python 2.3 (#46, Jul 29 2003, 18:54:32) [MSC v.1200 32 bit (Intel)] on win32
b = intern('foo bar')
a = 'foo bar'
a is 'foo bar' False b is 'foo bar'

False

Emile> That apparent space requirement should really be better documented.

The fact that the current implementation of CPython automatically interns
strings which look like identifiers is simply an efficiency consideration.
It's not part of the language definition, so doesn't bear documenting. The
correct way to compare two strings is using '==' (which is independent of
CPython's implementation details), not 'is'.

Skip

Jul 18 '05 #13

P: n/a

Skip Montanaro:
[ quoting me ]
Python 2.3 (#46, Jul 29 2003, 18:54:32) [MSC v.1200 32 bit (Intel)] on win32
>>> b = intern('foo bar')
>>> a = 'foo bar'
>>> a is 'foo bar' False >>> b is 'foo bar'

False

Emile> That apparent space requirement should really be better

documented.
The fact that the current implementation of CPython automatically interns strings which look like identifiers is simply an efficiency consideration. It's not part of the language definition, so doesn't bear documenting. The correct way to compare two strings is using '==' (which is independent of CPython's implementation details), not 'is'.

Skip


OK. But does intern() intern? I (thought I) only used is to show
that it wasn't intern()'d, and as the documentation holds intern'ing
up as an optimization technique, does it _only_ apply to strings that
look like identifiers? How else might you know?

Emile van Sebille
em***@fenx.com

Jul 18 '05 #14

P: n/a
Skip Montanaro wrote in message ...
The
correct way to compare two strings is using '==' (which is independent of
CPython's implementation details), not 'is'.

I think it's better to say that the only time you *use* 'is' is when you
*know* the object you're comparing to is a singleton (None, True, or False).

Every other time, use '=='. And don't be deceived by 'is' working
sometimes.
--
Francis Avila

Jul 18 '05 #15

P: n/a

Emile> But does intern() intern?

Yes, I believe it does:
intern('foo bar') 'foo bar' a = 'foo bar'
a is 'foo bar' 0 a = intern(a)
a is 'foo bar' 0 a is intern('foo bar') 1 a = 'foo_bar'
a is 'foo_bar'

1

Emile> does it _only_ apply to strings that look like identifiers?

Automatic interning, yes. At the point where the system has to decide
whether to automatically intern a string it knows nothing about the higher
level context in which the string appears. The optimization was added
because the strings which are manipulated most often by the interpreter
happen to be those which represent the program's identifiers (object
attributes, variables, etc). Consequently, the decision was made some time
ago to simply intern all strings which look like identifiers (as the snippet
above shows). It's a bit like driving a tack with a sledge hammer, but it
does improve interpreter performance.

Emile> How else might you know?

It doesn't really matter. The intern() function is available to programmers
who want to conciously optimize their string handling and is properly
documented. Automatic interning is not an optimization aimed at the
programmer, but at the internals of the interpreter. It's just a side
effect of the crude optimization that if you happen to use a string literal
in your program which looks like an identifier, it's interned
automatically. Look at is as a freebie. ;-)

Skip

Jul 18 '05 #16

P: n/a
Robin Becker <ro***@jessikat.fsnet.co.uk> wrote in
news:SW**************@jessikat.fsnet.co.uk:
In article <Xn**************************@127.0.0.1>, Duncan Booth
<du****@NOSPAMrcp.co.uk> writes
......

The recommended way these days is usually:

a = [ [] for i in range(10) ]

That still has a loop and works by appending empty lists, but at least
its just a single expression. Also you can incorporate the next stage
of your initialisation quite easily:

a = [ [b] for b in range(10) ]

I seem to remember the fastest way to do this was map(list,n*[[]])
from a couple of earlier threads, but is that true in 2.3?


I think I would prefer to maintain code with the list comprehension rather
than the map which, to my eyes, isn't immediately obvious. However, it
would appear that the list comprehension is much faster in any case, so in
this case I believe the clearest solution is also the fastest.

C:\>d:\python23\lib\timeit.py "[ [] for i in range(10) ]"
10000 loops, best of 3: 28.6 usec per loop

C:\>d:\python23\lib\timeit.py "map(list,10*[[]])"
10000 loops, best of 3: 102 usec per loop

C:\>d:\python23\lib\timeit.py -s ll=list -s mm=map "mm(ll,10*[[]])"
10000 loops, best of 3: 91.9 usec per loop

--
Duncan Booth du****@rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?
Jul 18 '05 #17

P: n/a
unfortunately all of these tests are slower than 1 clock tick on my
machine. I believe the may be biassed.
--
Robin Becker
Jul 18 '05 #18

P: n/a
Duncan Booth's prompted me to repeat all the nonsense about empty lists
of a few years ago. I am amazed at the differences between pythons.
Lambda is now faster than list!!! I would really like to know why
list([]) is so much slower than list(()). Clearly comprehensions are now
fast, but still slower than the corresponding map with a lambda.
My results all obtained on the same win2k sp4 machine.
C:\tmp>\python20\python ttt.py
list () = 2.09
list [] = 1.19
comprehension = 1.97
copy = 4.69
cCopy.copy = 2.09
lambda z: z[:] = 1.56
lambda z: list(z) = 2.66
lambda z: [] = 1.42

C:\tmp>\python21\python ttt.py
list () = 2.33
list [] = 1.23
comprehension = 1.78
copy = 4.34
cCopy.copy = 2.22
lambda z: z[:] = 1.55
lambda z: list(z) = 2.33
lambda z: [] = 1.41

C:\tmp>\python22\python ttt.py
list () = 3.22
list [] = 1.33
comprehension = 1.59
copy = 4.05
cCopy.copy = 2.13
lambda z: z[:] = 1.69
lambda z: list(z) = 2.55
lambda z: [] = 1.64

C:\tmp>\python23\python ttt.py
list () = 1.73
list [] = 3.22
comprehension = 1.00
copy = 3.94
cCopy.copy = 1.59
lambda z: z[:] = 1.14
lambda z: list(z) = 4.77
lambda z: [] = 0.95

############### ttt.py
import time
s = 'list ()'
t0=time.time()
for y in xrange(1000):
x = map(list,1000*[()])
t1 = time.time()
print "%-20s = %.2f" % (s,(t1-t0))
assert x[0]==[] and x[0] is not x[-1], "%s failed identity" % s

s = 'list []'
t0=time.time()
for y in xrange(1000):
x = map(list,1000*[[]])
t1 = time.time()
print "%-20s = %.2f" % (s,(t1-t0))
assert x[0]==[] and x[0] is not x[-1], "%s failed identity" % s

s = "comprehension"
t0=time.time()
for y in xrange(1000):
x = [[] for i in xrange(1000)]
t1 = time.time()
print "%-20s = %.2f" % (s,(t1-t0))
assert x[0]==[] and x[0] is not x[-1], "%s failed identity" % s

from copy import copy
s = 'copy'
t0=time.time()
for y in xrange(1000):
x = map(copy,1000*[[]])
t1 = time.time()
print "%-20s = %.2f" % (s,(t1-t0))
assert x[0]==[] and x[0] is not x[-1], "%s failed identity" % s

try:
from cCopy import copy as ccopy
s = 'cCopy.copy'
t0=time.time()
for y in xrange(1000):
x = map(ccopy,1000*[[]])
t1 = time.time()
print "%-20s = %.2f" % (s,(t1-t0))
assert x[0]==[] and x[0] is not x[-1], "%s failed identity" % s
except ImportError:
pass

s = 'lambda z: z[:]'
t0=time.time()
for y in xrange(1000):
x = map(lambda z: z[:],1000*[[]])
t1 = time.time()
print "%-20s = %.2f" % (s,(t1-t0))
assert x[0]==[] and x[0] is not x[-1], "%s failed identity" % s

s = 'lambda z: list(z)'
t0=time.time()
for y in xrange(1000):
x = map(lambda z: list(z),1000*[[]])
t1 = time.time()
print "%-20s = %.2f" % (s,(t1-t0))
assert x[0]==[] and x[0] is not x[-1], "%s failed identity" % s

s = 'lambda z: []'
t0=time.time()
for y in xrange(1000):
x = map(lambda z: [],xrange(1000))
t1 = time.time()
print "%-20s = %.2f" % (s,(t1-t0))
assert x[0]==[] and x[0] is not x[-1], "%s failed identity" % s
--
Robin Becker
Jul 18 '05 #19

P: n/a
Robin Becker wrote:
C:\tmp>\python22\python ttt.py
list () = 3.22
list [] = 1.33
lambda z: list(z) = 2.55 C:\tmp>\python23\python ttt.py
list () = 1.73
list [] = 3.22
lambda z: list(z) = 4.77


is this part of the benchmark stable, or was your computer doing
something else in the background? I find it a bit disturbing that a
2.3 optimization would slow down a trivial operation that much...

</F>


Jul 18 '05 #20

P: n/a
Robin Becker <ro***@jessikat.fsnet.co.uk> wrote in
news:tO**************@jessikat.fsnet.co.uk:
Duncan Booth's prompted me to repeat all the nonsense about empty lists
of a few years ago. I am amazed at the differences between pythons.
Lambda is now faster than list!!! I would really like to know why
list([]) is so much slower than list(()). Clearly comprehensions are now
fast, but still slower than the corresponding map with a lambda.
May I ask you to repeat your tests but putting all the code inside
functions, please? I think it is a bit unfair to compare a list
comprehension with a map/lambda and force the list comprehension to access
a global variable every time round the loop.

I think if you let the list comprehension use a local variable it should
comfortably beat the best of your lambdas (at least it does on my system).

By the way, I didn't understand your earlier comment:
unfortunately all of these tests are slower than 1 clock tick on my
machine. I believe the may be biassed.


You obviously think something was wrong with my earlier timings, but I
can't understand from this what the problem was; timeit.py does some
reasonably clever things to try to give an accurate answer including
varying the number of times it repeats the test to ensure that the time per
loop is based on a reasonably large time interval.

--
Duncan Booth du****@rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?
Jul 18 '05 #21

P: n/a
In article <ma**************************************@python.o rg>,
Fredrik Lundh <fr*****@pythonware.com> writes
Robin Becker wrote:
C:\tmp>\python22\python ttt.py
list () = 3.22
list [] = 1.33
lambda z: list(z) = 2.55

C:\tmp>\python23\python ttt.py
list () = 1.73
list [] = 3.22
lambda z: list(z) = 4.77


is this part of the benchmark stable, or was your computer doing
something else in the background? I find it a bit disturbing that a
2.3 optimization would slow down a trivial operation that much...

</F>

No this is stable I also find it really weird.
--
Robin Becker
Jul 18 '05 #22

P: n/a
In article <Xn***************************@127.0.0.1>, Duncan Booth
<du****@NOSPAMrcp.co.uk> writes
.....
You obviously think something was wrong with my earlier timings, but I
can't understand from this what the problem was; timeit.py does some
reasonably clever things to try to give an accurate answer including
varying the number of times it repeats the test to ensure that the time per
loop is based on a reasonably large time interval.

not at all, I was puzzled by the reversal of earlier stuff. You're right
about the comprehension as well, I had thought the variable was
internal, but making it local gives the comprehension the edge.

C:\tmp>\python23\python ttt.py
list () = 1.72
list [] = 3.27
comprehension = 0.81
copy = 3.94
cCopy.copy = 1.59
lambda z: z[:] = 1.14
lambda z: list(z) = 4.73
lambda z: [] = 0.95
f=lambda z: [] = 0.95

like /F though I am extremely puzzled about the list result.
--
Robin Becker
Jul 18 '05 #23

P: n/a
Robin Becker <ro***@jessikat.fsnet.co.uk> wrote in
news:LI**************@jessikat.fsnet.co.uk:

C:\tmp>\python23\python ttt.py
list () = 1.72
list [] = 3.27
comprehension = 0.81
copy = 3.94
cCopy.copy = 1.59
lambda z: z[:] = 1.14
lambda z: list(z) = 4.73
lambda z: [] = 0.95
f=lambda z: [] = 0.95

like /F though I am extremely puzzled about the list result.


I just had a look at listobject.c, in particular the code for list_fill.

static int
list_fill(PyListObject *result, PyObject *v)
{
PyObject *it; /* iter(v) */
int n; /* guess for result list size */
int i;

n = result->ob_size;

/* Special-case list(a_list), for speed. */
if (PyList_Check(v)) {
if (v == (PyObject *)result )
return 0; /* source is destination, we're done */
return list_ass_slice(result, 0, n, v);
}

/* Empty previous contents */
if (n != 0) {
if (list_ass_slice(result, 0, n, (PyObject *)NULL) != 0)
return -1;
}

/* Get iterator. There may be some low-level efficiency to be gained
* by caching the tp_iternext slot instead of using PyIter_Next()
* later, but premature optimization is the root etc.
*/
... and so on
}

Timing 'python \python23\lib\timeit.py "list([])"' gave times around
4.5uSec per loop, whereas the same with "list(())" gives about 2.3uSec per
loop.

Putting #ifdef 0/#endif around the 'special case for speed' block makes the
first case go from 4.5uS to 2.4uS, and the second case also speeds up
marginally for good measure.

So it looks suspiciously like a premature optimisation 'for speed' should
say 'for reduced speed'.

Next I tried:
python \python23\lib\timeit.py -s"r=range(10000)" "list(r)"

and
python \python23\lib\timeit.py -s"r=tuple(range(10000))" "list(r)"

For this length of list the optimisation is a clear winner.

It looks to me as though the breakeven is around 100 elements in the list.
With fewer than 100 elements the optimisation slows things down, with more
it speeds them up.

I put the tests in a batch file:

------ test.bat --------
@echo off
@echo Length: 0
python \python23\lib\timeit.py -s"r=[]" "list(r)"
python \python23\lib\timeit.py -s"r=()" "list(r)"
@echo Length: 1
python \python23\lib\timeit.py -s"r=range(1)" "list(r)"
python \python23\lib\timeit.py -s"r=tuple(range(1))" "list(r)"
@echo Length: 10
python \python23\lib\timeit.py -s"r=range(10)" "list(r)"
python \python23\lib\timeit.py -s"r=tuple(range(10))" "list(r)"
@echo Length: 100
python \python23\lib\timeit.py -s"r=range(100)" "list(r)"
python \python23\lib\timeit.py -s"r=tuple(range(100))" "list(r)"
@echo Length: 1000
python \python23\lib\timeit.py -s"r=range(1000)" "list(r)"
python \python23\lib\timeit.py -s"r=tuple(range(1000))" "list(r)"
@echo Length: 10000
python \python23\lib\timeit.py -s"r=range(10000)" "list(r)"
python \python23\lib\timeit.py -s"r=tuple(range(10000))" "list(r)"
------------------------
With the original code I get:

C:\Pythonsrc\python\dist\src\PCbuild>test
Length: 0
100000 loops, best of 3: 4.38 usec per loop
100000 loops, best of 3: 2.32 usec per loop
Length: 1
100000 loops, best of 3: 5.26 usec per loop
100000 loops, best of 3: 2.4 usec per loop
Length: 10
100000 loops, best of 3: 5.39 usec per loop
100000 loops, best of 3: 2.84 usec per loop
Length: 100
100000 loops, best of 3: 7.44 usec per loop
100000 loops, best of 3: 7.26 usec per loop
Length: 1000
10000 loops, best of 3: 28.5 usec per loop
10000 loops, best of 3: 50.4 usec per loop
Length: 10000
1000 loops, best of 3: 258 usec per loop
1000 loops, best of 3: 505 usec per loop

With the optimisation code completely removed I get:
C:\Pythonsrc\python\dist\src\PCbuild>test
Length: 0
100000 loops, best of 3: 2.21 usec per loop
100000 loops, best of 3: 2.24 usec per loop
Length: 1
100000 loops, best of 3: 2.34 usec per loop
100000 loops, best of 3: 2.36 usec per loop
Length: 10
100000 loops, best of 3: 2.84 usec per loop
100000 loops, best of 3: 2.85 usec per loop
Length: 100
100000 loops, best of 3: 7.56 usec per loop
100000 loops, best of 3: 7.13 usec per loop
Length: 1000
10000 loops, best of 3: 53.2 usec per loop
10000 loops, best of 3: 50.4 usec per loop
Length: 10000
1000 loops, best of 3: 549 usec per loop
1000 loops, best of 3: 534 usec per loop

With the optimisation tweaked to ignore 0 length lists entirely and only
'optimise' for lists with more than 100 elements:

C:\Pythonsrc\python\dist\src\PCbuild>test
Length: 0
1000000 loops, best of 3: 1.39 usec per loop
100000 loops, best of 3: 2.31 usec per loop
Length: 1
100000 loops, best of 3: 2.28 usec per loop
100000 loops, best of 3: 2.33 usec per loop
Length: 10
100000 loops, best of 3: 2.85 usec per loop
100000 loops, best of 3: 2.84 usec per loop
Length: 100
100000 loops, best of 3: 7.43 usec per loop
100000 loops, best of 3: 7.14 usec per loop
Length: 1000
10000 loops, best of 3: 28.4 usec per loop
10000 loops, best of 3: 50.2 usec per loop
Length: 10000
1000 loops, best of 3: 256 usec per loop
1000 loops, best of 3: 532 usec per loop

The relevant part of list_fill now looks like this:

/* Special-case list(a_list), for speed. */
if (PyList_Check(v)) {
int vsize = ((PyListObject*)v)->ob_size;
if (v == (PyObject *)result || vsize==0)
return 0; /* nothing to copy */
if (vsize > 100)
return list_ass_slice(result, 0, n, v);
}
--
Duncan Booth du****@rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?
Jul 18 '05 #24

This discussion thread is closed

Replies have been disabled for this discussion.