473,804 Members | 2,261 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

generator expressions: performance anomaly?

Please consider the timings below, where a generator expression starts
out slower than the equivalent list comprehension, and gets worse:
python -m timeit -s "orig=range(100 000)" "lst=orig[:];lst[:]=(x for x in orig)"
10 loops, best of 3: 6.84e+004 usec per loop
python -m timeit -s "orig=range(200 000)" "lst=orig[:];lst[:]=(x for x in orig)"
10 loops, best of 3: 5.22e+005 usec per loop
python -m timeit -s "orig=range(300 000)" "lst=orig[:];lst[:]=(x for x in orig)"
10 loops, best of 3: 1.32e+006 usec per loop
python -m timeit -s "orig=range(100 000)" "lst=orig[:];lst[:]=[x for x in orig]"
10 loops, best of 3: 6.15e+004 usec per loop
python -m timeit -s "orig=range(200 000)" "lst=orig[:];lst[:]=[x for x in orig]"
10 loops, best of 3: 1.43e+005 usec per loop
python -m timeit -s "orig=range(300 000)" "lst=orig[:];lst[:]=[x for x

in orig]"
10 loops, best of 3: 2.33e+005 usec per loop

Specs: Python 2.4, Windows 2000, 1.4GHz Athlon chip, 768Mb of memory.

Background: There was/is a very recent thread about ways of removing
all instances of x from a list. /F proposed a list comprehension to
build the result list. Given a requirement to mutate the original list,
this necessitates the assignment to lst[:]. I tried a generator
expression as well. However while the listcomp stayed competitive up to
a million-element list, the genexp went into outer space, taking about
20 times as long. The above timeit runs show a simpler scenario where
the genexp also seems to be going quadratic.
Comments, clues, ... please.

TIA,
John

Jul 18 '05 #1
10 1913
John Machin wrote:
Given a requirement to mutate the original list, this necessitates the assignment
to lst[:].


do you always pull requirements out of thin air?

</F>

Jul 18 '05 #2
John Machin wrote:
Background: There was/is a very recent thread about ways of removing
all instances of x from a list. /F proposed a list comprehension to
build the result list. Given a requirement to mutate the original list,
this necessitates the assignment to lst[:]. I tried a generator
expression as well. However while the listcomp stayed competitive up to
a million-element list, the genexp went into outer space, taking about
20 times as long. The above timeit runs show a simpler scenario where
the genexp also seems to be going quadratic.
Comments, clues, ... please.


Py> lc = [x for x in range(100)]
Py> len(lc)
100
Py> ge = (x for x in range(100))
Py> len(ge)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: len() of unsized object

It would be nice if unconditional ge's with known length inputs propagated
__len__, but that is not currently the case. There's a similar performance
glitch associated with constructing a tuple from a generator expression (with
vanilla 2.4, detouring via list is actually faster)

Cheers,
Nick.

--
Nick Coghlan | nc******@email. com | Brisbane, Australia
---------------------------------------------------------------
http://boredomandlaziness.skystorm.net
Jul 18 '05 #3

Fredrik Lundh wrote:
John Machin wrote:
Given a requirement to mutate the original list, this necessitates the assignment to lst[:].


do you always pull requirements out of thin air?

</F>


I was interested in doing a fair comparison between all the proposed
methods of deleting multiple occurrences from a list, and as the
majority were permuting the list in place, I shoe-horned the listcomp
and genexp methods into the same mould. If you choose to describe that
as pulling requirements out of thin air, feel free.

Do you have any suggestions as to why a generator expression would
perform much worse than the equivalent list comprehension?

[OT]
Anyway, I don't have time for chit-chat right now. My wife got a
digital camera for Xmas and dumped a whole bunch of 5 megapixel JPEGs
on me and I'm in the middle of writing a quick script to resize them,
using some imaging library I just downloaded off the web, written by
some grumpy PILlock from Sweden, evidently :-)

Cheers,
John

Jul 18 '05 #4
"John Machin" <sj******@lexic on.net> wrote in message
news:11******** *************@c 13g2000cwb.goog legroups.com...
Please consider the timings below, where a generator expression starts
out slower than the equivalent list comprehension, and gets worse:
python -m timeit -s "orig=range(100 000)" "lst=orig[:];lst[:]=(x for x in orig)"

. . .
python -m timeit -s "orig=range(200 000)" "lst=orig[:];lst[:]=(x for x

in orig)"


This has nothing to do with genexps and everything to do with list slice
assignment.

List slice assignment is an example of a tool with a special case optimization
for inputs that know their own length -- that enables the tool to pre-allocate
its result rather than growing and resizing in spurts. Other such tools include
tuple(), map() and zip().

The incredulous tone of the posting suggests a presumption that genexps always
outperform list comps. That is not the case. Sometimes you want a list instead
of a generator because you want more than just iteration; perhaps the consumer
function may be able to take advantage of length reporting, indexing, slicing,
re-iterability or other useful list behaviors.

Also, the timing jig bites. Use xrange() instead of range(). Otherwise,
there's an immediate forfeit of the memory savings normally being sought by the
use of generators and iterators.
Background: There was/is a very recent thread about ways of removing
all instances of x from a list. /F proposed a list comprehension to
build the result list.
FWIW, deques have a natural idiom for this kind of situation:

for i in xrange(len(myde que)):
x = mydeque.popleft ()
if some_condition( x):
mydeque.append( x)
Given a requirement to mutate the original list,
this necessitates the assignment to lst[:].
Sometimes people introduce arbitrary requirements and twist themselves in knots
when real applications tend to allow a wider range of alternative solutions.
There is a funky code smell emanating from any code resembling
a[:]=some_iterator. It would make me examine how 'a' is being used and check
whether the surrounding code can use the iterator or an new object.
Comments, clues, ... please.


As a point of interest, note that Py2.4 improved some of its built-in iterators
to report their length if requested. Now you now why.
d = dict(a=1, b=2, c=3, d=4)
len(d.iteritems ())

4

Raymond Hettinger
Jul 18 '05 #5
"Raymond Hettinger" <vz******@veriz on.net> writes:
As a point of interest, note that Py2.4 improved some of its
built-in iterators to report their length if requested. Now you know why.
d = dict(a=1, b=2, c=3, d=4)
len(d.iteritems ())

4


Is len(d.iteritems ()) ever different from just len(d)?
Jul 18 '05 #6
[Raymond Hettinger]
As a point of interest, note that Py2.4 improved some of its
built-in iterators to report their length if requested. Now you know why.
>> d = dict(a=1, b=2, c=3, d=4)
>> len(d.iteritems ()) 4
[Paul Rubin] Is len(d.iteritems ()) ever different from just len(d)?


Yes, after the iteration starts:
d = dict(a=1, b=2, c=3, d=4)
it = d.iteritems()
len(it) 4 it.next() ('a', 1) len(it)

3

In general, len(it) should give you what you would get with len(list(it)) but
without consuming the iterator.
Raymond
Jul 18 '05 #7
Paul Rubin wrote:
>d = dict(a=1, b=2, c=3, d=4)
>len(d.iter items())


4

Is len(d.iteritems ()) ever different from just len(d)?


You mean, for an arbitrary object d? Certainly:
class D: .... def iteritems(self) :return {}.iteritems()
.... def __len__(self): return 1
.... d=D()
len(d) 1 len(d.iteritems ())

0

Why do you ask?

Regards,
Martin
Jul 18 '05 #8
Nick Coghlan:
There's a similar performance glitch associated with constructing a

tuple from a generator expression (with vanilla 2.4, detouring via list
is actually faster)

You look right:

..from time import clock
..print "[x for x in l], list(x for x in l), aux = list(x for x in l);
tuple(aux), tuple(x for x in l):"
..for np in range(13, 21):
.. n = 1*2**np
.. print "n:", n, " s:",
.. l = xrange(n)
.. t1 = clock()
.. [x for x in l]
.. t2 = clock()
.. print round(t2-t1,3),
..
.. t1 = clock()
.. list(x for x in l)
.. t2 = clock()
.. print round(t2-t1,3),
..
.. t1 = clock()
.. aux = list(x for x in l)
.. tuple(aux)
.. t2 = clock()
.. print round(t2-t1,3),
..
.. #l = tuple(l) useless
.. t1 = clock()
.. tuple(x for x in l)
.. t2 = clock()
.. print round(t2-t1,3)
Output:
[x for x in l], list(x for x in l), aux = list(x for x in l);
tuple(aux), tuple(x for x in l):
n: 8192 s: 0.01 0.007 0.01 0.013
n: 16384 s: 0.024 0.019 0.021 0.032
n: 32768 s: 0.054 0.042 0.049 0.113
n: 65536 s: 0.111 0.078 0.101 0.088
n: 131072 s: 0.196 0.155 0.183 0.177
n: 262144 s: 0.436 0.385 0.429 1.832
n: 524288 s: 0.921 0.744 0.877 8.271
n: 1048576 s: 1.86 1.546 1.866 33.154

The timings for tuple(x for x in l) seem to grow as len(n)**2.
Bear hugs,
Bearophile

Jul 18 '05 #9
On Sun, 16 Jan 2005 12:18:23 GMT, "Raymond Hettinger"
<vz******@veriz on.net> wrote:
"John Machin" <sj******@lexic on.net> wrote in message
news:11******* **************@ c13g2000cwb.goo glegroups.com.. .
Please consider the timings below, where a generator expression starts
out slower than the equivalent list comprehension, and gets worse:
>python -m timeit -s "orig=range(100 000)" "lst=orig[:];lst[:]=(x for x

in orig)"

. . .
>python -m timeit -s "orig=range(200 000)" "lst=orig[:];lst[:]=(x for x

in orig)"


This has nothing to do with genexps and everything to do with list slice
assignment.

List slice assignment is an example of a tool with a special case optimization
for inputs that know their own length -- that enables the tool to pre-allocate
its result rather than growing and resizing in spurts. Other such tools include
tuple(), map() and zip().


My reading of the source: if the input is not a list or tuple, a
(temporary) tuple is built from the input, using PySequence_Tupl e() in
abstract.c. If the input cannot report its own length, then that
function resorts to "growing and resizing in spurts", using the
following code:

if (j >= n) {
if (n < 500)
n += 10;
else
n += 100;
if (_PyTuple_Resiz e(&result, n) != 0) {

Perhaps it could be changed to use a proportional increase, like
list_resize() in listobject.c, which advertises (amortised) linear
time. Alternative: build a temporary list instead?
Jul 18 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

72
4431
by: Raymond Hettinger | last post by:
Peter Norvig's creative thinking triggered renewed interest in PEP 289. That led to a number of contributors helping to re-work the pep details into a form that has been well received on the python-dev list: http://www.python.org/peps/pep-0289.html In brief, the PEP proposes a list comprehension style syntax for creating fast, memory efficient generator expressions on the fly: sum(x*x for x in roots)
9
2691
by: Francis Avila | last post by:
A little annoyed one day that I couldn't use the statefulness of generators as "resumable functions", I came across Hettinger's PEP 288 (http://www.python.org/peps/pep-0288.html, still listed as open, even though it's at least a year old and Guido doesn't seem very hot on the idea). I'm not too sure of its ideas on raising exceptions in generators from outside (although it looks like it might be convenient in some cases), but being able...
24
3362
by: Mahesh Padmanabhan | last post by:
Hi, When list comprehension was added to the language, I had a lot of trouble understanding it but now that I am familiar with it, I am not sure how I programmed in Python without it. Now I see that generator expressions have been added to the language with 2.4 and I question the need for it. I know that it allows for lazy evaluation which speeds things up for larger lists but why was it necessary to add it instead of improving list...
2
1864
by: zipher | last post by:
It seems the debate over PEP 308 (if-then-else expression) occurred prior to the arrival of generator expressions. Mightn't this new latter syntax be the ticket to a "one obvious way" to write a ternary expression in python? >>> (foo(i) if i==42 else bar(i)) # i==42 ? foo(i) : bar(i) zipher
45
3054
by: Joh | last post by:
hello, i'm trying to understand how i could build following consecutive sets from a root one using generator : l = would like to produce : , , , ,
23
2286
by: Mike Meyer | last post by:
Ok, we've added list comprehensions to the language, and seen that they were good. We've added generator expressions to the language, and seen that they were good as well. I'm left a bit confused, though - when would I use a list comp instead of a generator expression if I'm going to require 2.4 anyway? Thanks, <mike --
3
1166
by: mk | last post by:
Hi. I was hoping someone could explain weird performance anomaly. I ran a small test to compare a new X6800 Core 2 Duo build against a three year old P4EE: (Both machines running VS2005) DateTime dt = DateTime.Now; string s = String.Empty; for (int i = 0; i < 9999; i++) { s += new string('x', 1000);
7
1282
by: mk | last post by:
Hi. This is probably not the best place to post this, but I was hoping someone could explain weird performance anomaly. I ran a small test to compare a new X6800 Core 2 Duo build against a three year old P4EE: (Both machines running VS2005) DateTime dt = DateTime.Now; string s = String.Empty; for (int i = 0; i < 9999; i++) { s += new string('x', 1000);
0
9714
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9594
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10350
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
10096
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9174
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6866
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5534
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5673
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
3834
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.