List problem

Thomas M.

Hello,

i have a question on the following code.
test_list = [1, 2, 3]

for i in test_list:
print i

if 1 in test_list:
test_list.remove(1)
Why is the second item not print ?
Is that a bug or a feature ?!
Thanks,
Thomas

Jul 18 '05 #1

Subscribe Post Reply

2088

Steven Bethard

Thomas M. <thomas.sunshine <at> web.de> writes:

test_list = [1, 2, 3]

for i in test_list:
print i

if 1 in test_list:
test_list.remove(1)

Why is the second item not print ?

Maybe this will help illustrate the problem:

test_list = [1, 2, 3]
for i, item in enumerate(test_list): .... print item
.... print "start", i, test_list[i], test_list
.... if 1 in test_list:
.... test_list.remove(1)
.... print "end", i, test_list[i], test_list
....
1
start 0 1 [1, 2, 3]
end 0 2 [2, 3]
3
start 1 3 [2, 3]
end 1 3 [2, 3]

On the first iteration of the loop, you are looking at item 0 in a 3-item list,
[1, 2, 3]. When you remove 1 from the list, you convert the 3-item list into a
2-item list, thust shifting the indices down, so that test_list[0] is now 2.
When the for loop continues, it is now looking at index 1, and the item at index
1 in your (adjusted) 2-item list is 3.

In general, removing elements from a list while you're iterating though the list
is a bad idea. Perhaps a better solution is a list comprehension:
test_list = [1, 2, 3]
[x for x in test_list if x != 1] [2, 3] for item in [x for x in test_list if x != 1]:

.... print item
....
2
3

Steve

Jul 18 '05 #2

User

On Fri, 29 Oct 2004 17:52:03 +0000 (UTC), Steven Bethard
<st************@gmail.com> wrote:

Thomas M. <thomas.sunshine <at> web.de> writes:

test_list = [1, 2, 3]

for i in test_list:
print i

if 1 in test_list:
test_list.remove(1)

Why is the second item not print ?
Maybe this will help illustrate the problem:
test_list = [1, 2, 3]
for i, item in enumerate(test_list):... print item
... print "start", i, test_list[i], test_list
... if 1 in test_list:
... test_list.remove(1)
... print "end", i, test_list[i], test_list
...
1
start 0 1 [1, 2, 3]
end 0 2 [2, 3]
3
start 1 3 [2, 3]
end 1 3 [2, 3]

On the first iteration of the loop, you are looking at item 0 in a 3-item list,
[1, 2, 3]. When you remove 1 from the list, you convert the 3-item list into a
2-item list, thust shifting the indices down, so that test_list[0] is now 2.
When the for loop continues, it is now looking at index 1, and the item at index
1 in your (adjusted) 2-item list is 3.

In general, removing elements from a list while you're iterating though the list
is a bad idea. Perhaps a better solution is a list comprehension:

Unless you're using a while loop and iterating in reverse, for
example:

a = [0,1,2,3,4]
pos_max = len(a) - 1 # Maximum iterable element
pos = pos_max # Current element
while pos >= 0:
if a[pos] == 2:
a.remove(2)
pos_max = pos_max - 1
pos = pos - 1

test_list = [1, 2, 3]
[x for x in test_list if x != 1][2, 3] for item in [x for x in test_list if x != 1]:

... print item
...
2
3

Steve

Jul 18 '05 #3

Alex Martelli

User <1@2.3> wrote:
...

In general, removing elements from a list while you're iterating though
the list is a bad idea. Perhaps a better solution is a list
comprehension:

Unless you're using a while loop and iterating in reverse, for
example:

a = [0,1,2,3,4]
pos_max = len(a) - 1 # Maximum iterable element
pos = pos_max # Current element
while pos >= 0:
if a[pos] == 2:
a.remove(2)
pos_max = pos_max - 1
pos = pos - 1

Still a bad idea.

a[:] = [x for x in a if x != 2]

is more concise, idiomatic, and speedy. No reason to do low-level
twiddling with indices and fiddly loops, in cases where a list
comprehension can do the job so simply.
Alex

Jul 18 '05 #4

Reinhold Birkenfeld

Alex Martelli wrote:

User <1@2.3> wrote:
...
>In general, removing elements from a list while you're iterating though
>the list is a bad idea. Perhaps a better solution is a list
>comprehension:

Unless you're using a while loop and iterating in reverse, for
example:

a = [0,1,2,3,4]
pos_max = len(a) - 1 # Maximum iterable element
pos = pos_max # Current element
while pos >= 0:
if a[pos] == 2:
a.remove(2)
pos_max = pos_max - 1
pos = pos - 1

Still a bad idea.

a[:] = [x for x in a if x != 2]

^^^
What are these for?

Reinhold

--
[Windows ist wie] die Bahn: Man muss sich um nichts kuemmern, zahlt fuer
jede Kleinigkeit einen Aufpreis, der Service ist mies, Fremde koennen
jederzeit einsteigen, es ist unflexibel und zu allen anderen Verkehrs-
mitteln inkompatibel. -- Florian Diesch in dcoulm

Jul 18 '05 #5

Peter Hansen

Reinhold Birkenfeld wrote:

Alex Martelli wrote:
a[:] = [x for x in a if x != 2]

^^^
What are these for?

Observe the difference between these two approaches:

Python 2.3.4 (#53, May 25 2004, 21:17:02) [MSC v.1200 32 bit (Intel)]

b = a = range(5)
b is a True a[:] = [x for x in a if x != 2]
a [0, 1, 3, 4] b [0, 1, 3, 4] b is a True
b = a = range(5)
a = [x for x in a if x != 2]
b is a

False
[:] on the left side of the assignment basically causes
"slice assignment" to the whole list, modifying it in-place
instead of creating a new list.

-Peter

Jul 18 '05 #6

Reinhold Birkenfeld

Peter Hansen wrote:

[:] on the left side of the assignment basically causes
"slice assignment" to the whole list, modifying it in-place
instead of creating a new list.

*bang*

Reinhold

What you just heard was my hand hitting my head...

--
[Windows ist wie] die Bahn: Man muss sich um nichts kuemmern, zahlt fuer
jede Kleinigkeit einen Aufpreis, der Service ist mies, Fremde koennen
jederzeit einsteigen, es ist unflexibel und zu allen anderen Verkehrs-
mitteln inkompatibel. -- Florian Diesch in dcoulm

Jul 18 '05 #7

Bengt Richter

On Sun, 31 Oct 2004 14:28:39 +0200, al*****@yahoo.com (Alex Martelli) wrote:

User <1@2.3> wrote:
...
>In general, removing elements from a list while you're iterating though
>the list is a bad idea. Perhaps a better solution is a list
>comprehension:

Unless you're using a while loop and iterating in reverse, for
example:

a = [0,1,2,3,4]
pos_max = len(a) - 1 # Maximum iterable element
pos = pos_max # Current element
while pos >= 0:
if a[pos] == 2:
a.remove(2)
pos_max = pos_max - 1
pos = pos - 1

Still a bad idea.

a[:] = [x for x in a if x != 2]

is more concise, idiomatic, and speedy. No reason to do low-level
twiddling with indices and fiddly loops, in cases where a list
comprehension can do the job so simply.

Except maybe if the list is huge, and you don't want to generate a duplicate, e.g.,

a = [1,2,3,2,4,5]
a [1, 2, 3, 2, 4, 5] i = 0
for x in a: ... if x != 2:
... a[i] = x
... i += 1
... del a[i:]
a

[1, 3, 4, 5]

Should also be faster for large lists, IWT.

Regards,
Bengt Richter

Jul 18 '05 #8

Alex Martelli

Bengt Richter <bo**@oz.net> wrote:
...

a[:] = [x for x in a if x != 2]

is more concise, idiomatic, and speedy. No reason to do low-level
twiddling with indices and fiddly loops, in cases where a list
comprehension can do the job so simply.
Except maybe if the list is huge, and you don't want to generate a

duplicate, e.g.,
... Should also be faster for large lists, IWT.

'shud' is a 4-letter word -- i prefer to measure. Of course, it does
depend what you mean by large/huge -- I hope 10 million items is enough;
and what you're doing exactly -- I've kept 'deleting item 2' as it was
the example that had been used so far.

This is the module for measurement:

def delsome_br(n=1000000, todel=[2]):
a = range(n)
i = 0
for x in a:
if x not in todel:
a[i] = x
i += 1
del a[i:]
return a

def delsome_am(n=1000000, todel=[2]):
a = range(n)
a = [x for x in a if x not in todel]

and these are the numbers I see:

kallisti:/tmp alex$ python -mtimeit -s'import br' 'br.delsome_br()'
10 loops, best of 3: 4.44 sec per loop
kallisti:/tmp alex$ python -mtimeit -s'import br' 'br.delsome_am()'
10 loops, best of 3: 3.92 sec per loop

Python 2.4b1, of course -- somebody SO interested in performance, as to
use those six delicate, fragile low-level lines, instead of the solid,
no-space-for-bugs list comprehension, surely WANTS the performance
pluses of 2.4, anyway;-).

I'm sure platforms and tasks can be devised that make delsome_br faster.
However, personally, I'm not surprised that, on a task I tried to code
as impartially as possible, the "should be faster" fussy, low-level
solution ends up taking _longer_ than the simpler, higher-level one;
it's not always that way, but it matches my general Python experience...
"simple is good, simple is fast, high-level is simple" memes;-).

The tasks ARE a bit time-consuming, as you see (30 times 4+ seconds is
over a couple of minutes per variation...), so I'm not going to dwell
much deeper, but, of course, that's just why I posted everything -- so
ingenious people can sweat it out and find a way to show the lower level
version as faster, with the appropriate combination of platform and
tasks. Some suggestions...:

The trick would be to hit just the right size of list that will make the
working set of the LC exceed available physical memory, while leaving
the low-level approach with a working set that still fits; the thrashing
will then kill the performance of one, but not the other. To avoid
having to aim for many tens of millions of items, a machine with scarce
memory or very ineffective VM algorithms would be better -- some old PC
with 64M of RAM and win/98 might be ideal, for example. Unfortunately I
don't have any machine with less than 640 M any more, even the laptops
(RAM is so cheap, and SO precious to performance, that I can't justify
stinting on THAT, of all items!), so I can't really help much there.
Alex

Jul 18 '05 #9

User

On Sun, 31 Oct 2004 14:28:39 +0200, al*****@yahoo.com (Alex Martelli)
wrote:

User <1@2.3> wrote:
...
>In general, removing elements from a list while you're iterating though
>the list is a bad idea. Perhaps a better solution is a list
>comprehension:

Unless you're using a while loop and iterating in reverse, for
example:

a = [0,1,2,3,4]
pos_max = len(a) - 1 # Maximum iterable element
pos = pos_max # Current element
while pos >= 0:
if a[pos] == 2:
a.remove(2)
pos_max = pos_max - 1
pos = pos - 1

Still a bad idea.

a[:] = [x for x in a if x != 2]

is more concise, idiomatic, and speedy. No reason to do low-level
twiddling with indices and fiddly loops, in cases where a list
comprehension can do the job so simply.
Alex

Interesting piece of code. In the example I gave, I was really just
trying to simplify a rather messy function I wrote for checking a list
of files for duplicates. It is messy because it does two passes (read
up to a user specified maximum, if they still match, read entire files
and compare). I use this approach because I believe it is most
efficient in terms of not making any more comparisons than necessary.
I initially attempted to do this with a "for" loop, but ended up with
a mess when I started taking things out of it. Anyway, here is the
original function:
def dupe_check(input_list, init_read):

# Function checks a list of files for duplicates
# Input list Format is ['c:\\junk\\trash.txt',
'd:\\stuff\\junk.jpg'......]
# init_read = max file size read on first pass.
# function returns a list of lists of matching files

a = input_list
tot = 0; fail = 0
q = []; pos = 0; len_a = len(a) - 1
while len_a >= 0:
file1 = a[len_a]
dx = read_close(file1, 'rb', init_read)
q.append([a[len_a]])
a.remove(a[len_a])
len_a = len_a - 1
len_sub = len_a
while len_sub >= 0:
dy = read_close(a[len_sub], 'rb', init_read)
if dx == dy:
tot = tot + 1
if read_close(a[len_sub], 'rb', -1) == read_close(file1, 'rb',
-1):
q[pos].append(a[len_sub])
a.remove(a[len_sub])
len_a = len_a - 1
else:
fail = fail + 1
len_sub = len_sub - 1
pos = pos + 1

# copy only lists of two or more from q to z

z = []
for x in q:
if len(x) > 1:
z.append(x)
if tot > 0:
perc_fails = 100.0 * float(fail)/float(tot)
else:
perc_fails = 0.0
return z, perc_fails

def read_close(file, mode, max_read):
# Opens a file, reads its data, closes
# files, then returns data

f = open(file, mode)
data = f.read(max_read)
f.close()
return data

Jul 18 '05 #10

Bengt Richter

On Mon, 1 Nov 2004 00:45:24 +0200, al*****@yahoo.com (Alex Martelli) wrote:

Bengt Richter <bo**@oz.net> wrote:
...
> a[:] = [x for x in a if x != 2] ^^^^-[1] >
>is more concise, idiomatic, and speedy. No reason to do low-level
>twiddling with indices and fiddly loops, in cases where a list
>comprehension can do the job so simply.
> Except maybe if the list is huge, and you don't want to generate a

duplicate, e.g.,
...
Should also be faster for large lists, IWT.

'shud' is a 4-letter word -- i prefer to measure. Of course, it does
depend what you mean by large/huge -- I hope 10 million items is enough;
and what you're doing exactly -- I've kept 'deleting item 2' as it was
the example that had been used so far.

This is the module for measurement:

def delsome_br(n=1000000, todel=[2]):
a = range(n)
i = 0
for x in a:
if x not in todel:
a[i] = x
i += 1
del a[i:]
return a

def delsome_am(n=1000000, todel=[2]):
a = range(n)
a = [x for x in a if x not in todel]

^-- shud be a[:] ;-)

(I think the missing return a is in the noise ;-)
and these are the numbers I see:

kallisti:/tmp alex$ python -mtimeit -s'import br' 'br.delsome_br()'
10 loops, best of 3: 4.44 sec per loop
kallisti:/tmp alex$ python -mtimeit -s'import br' 'br.delsome_am()'
10 loops, best of 3: 3.92 sec per loop
I thought you had 'way faster machines than mine. Neither time is an improvement
over delsome_br for 2.3.2 on my old box. I wonder what's up.
Python 2.4b1, of course -- somebody SO interested in performance, as to
use those six delicate, fragile low-level lines, instead of the solid,
no-space-for-bugs list comprehension, surely WANTS the performance
pluses of 2.4, anyway;-).

Python 2.3.2 (#49, Oct 2 2003, 20:02:00) [MSC v.1200 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
def delsome_br(n=1000000, todel=[2]): ... a = range(n)
... i = 0
... for x in a:
... if x not in todel:
... a[i] = x
... i += 1
... del a[i:]
... return a
... def delsome_am(n=1000000, todel=[2]): ... a = range(n)
... a = [x for x in a if x not in todel]
... from time import clock
for f in delsome_br, delsome_am:

... print f.__name__
... for i in range(3):
... t0 = clock()
... ignore = f()
... print clock()-t0
...
delsome_br
3.39048024526
3.63548729364
3.6326486655
delsome_am
6.40547292869
6.32403355062
6.21752171923

I'm sure platforms and tasks can be devised that make delsome_br faster.
However, personally, I'm not surprised that, on a task I tried to code
as impartially as possible, the "should be faster" fussy, low-level a[:] would improve impartiality a smidge ;-) And change the numbers on my box:
delsome_br
3.36129106876
3.63628348399
3.63185750372
delsome_am
6.77888285274
6.76994708267
6.74697154332
solution ends up taking _longer_ than the simpler, higher-level one;
it's not always that way, but it matches my general Python experience...
"simple is good, simple is fast, high-level is simple" memes;-).
I agree, really ;-) Ideally optimization should eventually drive equivalent
functionality towards equivalent compiled code anyway.
The tasks ARE a bit time-consuming, as you see (30 times 4+ seconds is
over a couple of minutes per variation...), so I'm not going to dwell
much deeper, but, of course, that's just why I posted everything -- so
ingenious people can sweat it out and find a way to show the lower level
version as faster, with the appropriate combination of platform and
tasks. Some suggestions...:

The trick would be to hit just the right size of list that will make the I did mention "huge" and "maybe" ;-)
working set of the LC exceed available physical memory, while leaving
the low-level approach with a working set that still fits; the thrashing
will then kill the performance of one, but not the other. To avoid
having to aim for many tens of millions of items, a machine with scarce
memory or very ineffective VM algorithms would be better -- some old PC
with 64M of RAM and win/98 might be ideal, for example. Unfortunately I
don't have any machine with less than 640 M any more, even the laptops
(RAM is so cheap, and SO precious to performance, that I can't justify
stinting on THAT, of all items!), so I can't really help much there.

Ok, point made ;-)

Regards,
Bengt Richter

Jul 18 '05 #11

Josiah Carlson

bo**@oz.net (Bengt Richter) wrote:

delsome_br
3.39048024526
3.63548729364
3.6326486655
delsome_am
6.40547292869
6.32403355062
6.21752171923

FYI: list comprehensions in 2.4 are significantly faster than list
comprehensions in 2.3, which is likely where a large portion of your
difference is coming from.

I wasn't sure that I believed your statements in regards to Python's
cache performance, so I thought I'd give it a look...

Python 2.3.2 (#49, Oct 2 2003, 20:02:00) [MSC v.1200 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.

a = range(1000000)
import time
def f(a): ... t = time.time()
... for i in a:
... pass
... return time.time()-t
... for i in xrange(10): ... print f(a),
...
0.421999931335 0.421000003815 0.421999931335 0.422000169754 0.421999931335 0.437
99996376 0.421000003815 0.421999931335 0.43799996376 0.406000137329 import random
random.shuffle(a)
for i in xrange(10): ... print f(a),
...
0.594000101089 0.59299993515 0.610000133514 0.608999967575 0.608999967575 0.5940
00101089 0.593999862671 0.608999967575 0.594000101089 0.593999862671 a.sort()
for i in xrange(10):

... print f(a),
...
0.421999931335 0.421999931335 0.421999931335 0.422000169754 0.43700003624 0.4219
99931335 0.421999931335 0.421999931335 0.422000169754 0.43700003624
Looks to be an almost 50% slowdown on my dual celeron 400mhz machine
(128kb cache). Very interesting, I would have never have expected as
much (due to the amount of overhead of Python data structures, I
believed it was generally a wash).

- Josiah

Jul 18 '05 #12

Alex Martelli

Bengt Richter <bo**@oz.net> wrote:
...

def delsome_am(n=1000000, todel=[2]):
a = range(n)
a = [x for x in a if x not in todel]

^-- shud be a[:] ;-)
(I think the missing return a is in the noise ;-)

Right, sorry.

kallisti:/tmp alex$ python -mtimeit -s'import br' 'br.delsome_br()'
10 loops, best of 3: 4.44 sec per loop
kallisti:/tmp alex$ python -mtimeit -s'import br' 'br.delsome_am()'
10 loops, best of 3: 3.92 sec per loop

I thought you had 'way faster machines than mine. Neither time is an
improvement over delsome_br for 2.3.2 on my old box. I wonder what's up.

I ran these on an 800 MHz laptop, the one I invariably use for netting
-- best ergonomics. OK, I'll try with faster machines when I get a
chance -- I'm surprised you measure almost a 1:2 ratio with _br faster.
Alex

Jul 18 '05 #13

Bengt Richter

On Mon, 1 Nov 2004 09:46:09 +0200, al*****@yahoo.com (Alex Martelli) wrote:

Bengt Richter <bo**@oz.net> wrote:
...
>def delsome_am(n=1000000, todel=[2]):
> a = range(n)
> a = [x for x in a if x not in todel]

^-- shud be a[:] ;-)
(I think the missing return a is in the noise ;-)

Right, sorry.
>kallisti:/tmp alex$ python -mtimeit -s'import br' 'br.delsome_br()'
>10 loops, best of 3: 4.44 sec per loop
>kallisti:/tmp alex$ python -mtimeit -s'import br' 'br.delsome_am()'
>10 loops, best of 3: 3.92 sec per loop
>

I thought you had 'way faster machines than mine. Neither time is an
improvement over delsome_br for 2.3.2 on my old box. I wonder what's up.

I ran these on an 800 MHz laptop, the one I invariably use for netting
-- best ergonomics. OK, I'll try with faster machines when I get a
chance -- I'm surprised you measure almost a 1:2 ratio with _br faster.

Maybe the listcomp optimizations were introduced in 2.3.4? Or even 2.4.x?

I have Python 2.3.2 on an old 300 MHz P2 with 320 MB RAM and NT4 as OS,
so the mystery deepens ;-)

Regards,
Bengt Richter

Jul 18 '05 #14

Alex Martelli

Bengt Richter <bo**@oz.net> wrote:
...

I ran these on an 800 MHz laptop, the one I invariably use for netting
-- best ergonomics. OK, I'll try with faster machines when I get a
chance -- I'm surprised you measure almost a 1:2 ratio with _br faster.
Maybe the listcomp optimizations were introduced in 2.3.4? Or even 2.4.x?

2.4, I think.
I have Python 2.3.2 on an old 300 MHz P2 with 320 MB RAM and NT4 as OS,
so the mystery deepens ;-)

The real mistery is why you don't download and install 2.4b1, which can
perfectly live side by side with your (outdated) 2.3 install...!-)

Python's quality depends on people downloading and trying out the betas,
after all. If knowledgeable enthusiasts like you don't do it, who?-)
Alex

Jul 18 '05 #15

Bengt Richter

On Mon, 01 Nov 2004 00:13:18 -0700, Josiah Carlson <jc******@uci.edu> wrote:

bo**@oz.net (Bengt Richter) wrote:
delsome_br
3.39048024526
3.63548729364
3.6326486655
delsome_am
6.40547292869
6.32403355062
6.21752171923
FYI: list comprehensions in 2.4 are significantly faster than list
comprehensions in 2.3, which is likely where a large portion of your
difference is coming from.

Sounds reasonable.
I wasn't sure that I believed your statements in regards to Python's
cache performance, so I thought I'd give it a look... I'm not sure which statements you are referring to, but if it's about
bad jobsharing between multiple processors, I don't think that's what's
causing the result you get below. I get a similar slowdown after shuffling.

I would guess only one of your processors is really doing the work of the
interpreter, and the other mostly idling and taking care of i/o and interrupts.

So why would a single processor slow down? The low level in for i in a: pass must
be to bind i to successive integers, however laid out. Meaning incrementing and
decrementing ref counts on the integer objects, and doing that either in linear
memory order or shuffled order, with the integer objects themselves not moving
at all, even with the shuffle of references to them in a.

So how big an L2 cache? 512k bytes? How big is an int object? Even if type info is
shared for an arena, it must be a value,refcount pair each at least, or 8 bytes.
I guess I'd bet on no tricks for type info, so that's a pointer for that -> 12 bytes.
so 512k/12 is about 43k integer objects in L2 cache. But we have a million, so if they
are laid out linearly, we should have 1e6*12/512k or about 22.89 int objects modulo sharing
each cache space that can span an integer. Ok, but L2 cache communicates with CPU cache
which is much smaller, maybe 16 or 32k. 8k 32-bit values in 2k 128-bit chunks. Maybe ;-)
or about 2700 12-byte int objects.

But cache lines probably go back and forth to L2 memory 128 bits wide or 16 bytes.
Which doesn't match 12 bytes, so what happens when you walk orderly through type,refount,value
triplets presumably reading type and reading and writing refount twice, and ignoring the value
since were doing pass?

Anyway, I'm not sure how smart L2 cache is with communicating to slower DRAM, which might
be interleaved in some cases, and have narrower data path. So many factors.

But you can see reading 3*16 will get you 4*12 where randomly you might have to read 4*16
or even more to get your 4*12, depending on the overlaps. If that is what's doing it,
going to a different int representation, where only refcount and value are stored, and the type
info is separate, then the 8 byte lineup with cache chunks ought to make noticeable speedup
for int ops. Of course, I don't know what percentage of all ops that is. I guess someone
instruments the interpreter for that kind of statistics? Timbot?

Python 2.3.2 (#49, Oct 2 2003, 20:02:00) [MSC v.1200 32 bit (Intel)] on wi=
n32
Type "help", "copyright", "credits" or "license" for more information.
a =3D range(1000000)
import time
def f(a):=2E.. t =3D time.time()
=2E.. for i in a:
=2E.. pass
=2E.. return time.time()-t
=2E.. for i in xrange(10):=2E.. print f(a),
=2E..
0.421999931335 0.421000003815 0.421999931335 0.422000169754 0.421999931335 =
0.437
99996376 0.421000003815 0.421999931335 0.43799996376 0.406000137329 import random
random.shuffle(a)
for i in xrange(10):=2E.. print f(a),
=2E..
0.594000101089 0.59299993515 0.610000133514 0.608999967575 0.608999967575 0.=
5940
00101089 0.593999862671 0.608999967575 0.594000101089 0.593999862671 a.sort()
for i in xrange(10):

=2E.. print f(a),
=2E..
0.421999931335 0.421999931335 0.421999931335 0.422000169754 0.43700003624 0.=
4219
99931335 0.421999931335 0.421999931335 0.422000169754 0.43700003624
Looks to be an almost 50% slowdown on my dual celeron 400mhz machine
(128kb cache). Very interesting, I would have never have expected as
much (due to the amount of overhead of Python data structures, I
believed it was generally a wash).

Regards,
Bengt Richter

Jul 18 '05 #16

Josiah Carlson

bo**@oz.net (Bengt Richter) wrote:

On Mon, 01 Nov 2004 00:13:18 -0700, Josiah Carlson <jc******@uci.edu> wrote:
I wasn't sure that I believed your statements in regards to Python's
cache performance, so I thought I'd give it a look...
I'm not sure which statements you are referring to, but if it's about
bad jobsharing between multiple processors, I don't think that's what's
causing the result you get below. I get a similar slowdown after shuffling.

For some reason, when someone mentioned 'working set size', I got to
thinking of processor caches and not VM working sets.

Sorry about that, I had just gotten back from throwing a party, and my
brain was pretty fried.

Really the numbers I quoted don't show anything interesting in regards
to dual vs single processor performance, but it does show interaction
between processor caches and locality in Python programs.

When created, the original list has Python integers placed sequentially
in the integer free list, so when iterating over them, we get reasonable
cache performance.

After shuffled, we are basically doing random reads through the integer
free list; and since my processor's cache is so much smaller than the
integer free list, we observe a ~50% slowdown.

[snip] So how big an L2 cache? 512k bytes? How big is an int object? Even if type info is
shared for an arena, it must be a value,refcount pair each at least, or 8 bytes.
I guess I'd bet on no tricks for type info, so that's a pointer for that -> 12 bytes.

Though I could have sworn int objects each took 16 bytes, testing it out
(because I can't find the proper C source file in Python CVS) puts it at
12 bytes/int. Curse my human brain.

[snip discussion of cache sizes, interleaving, etc.]

Anyways, it is interesting to note that Python does show the classic
memory locality and caching performance, even though it doesn't change
performance all that much.
- Josiah

Jul 18 '05 #17

Bengt Richter

On Mon, 1 Nov 2004 11:01:15 +0200, al*****@yahoo.com (Alex Martelli) wrote:

Bengt Richter <bo**@oz.net> wrote:
...
>I ran these on an 800 MHz laptop, the one I invariably use for netting
>-- best ergonomics. OK, I'll try with faster machines when I get a
>chance -- I'm surprised you measure almost a 1:2 ratio with _br faster.
>

Maybe the listcomp optimizations were introduced in 2.3.4? Or even 2.4.x?

2.4, I think.
I have Python 2.3.2 on an old 300 MHz P2 with 320 MB RAM and NT4 as OS,
so the mystery deepens ;-)

The real mistery is why you don't download and install 2.4b1, which can
perfectly live side by side with your (outdated) 2.3 install...!-)

Python's quality depends on people downloading and trying out the betas,
after all. If knowledgeable enthusiasts like you don't do it, who?-)

Ok, I made an effort ;-) The newfangled M$ installer is not compatible
with my NT4 service pack level, so I will have to build from tgz. Ok, but
my disk is too full for that at the moment ;-( So it will have to wait.
Sorry.

Regards,
Bengt Richter

Jul 18 '05 #18

Similar topics