473,326 Members | 2,255 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,326 software developers and data experts.

List Performance

If I happen to have a list that contains over 50,000 items, will the
size of the list severely impact the performance of appending to the
list?
Jun 30 '08 #1
9 2558
Ampedesign wrote:
If I happen to have a list that contains over 50,000 items, will the
size of the list severely impact the performance of appending to the
list?
No.

$ python -m timeit -n20000 -s"items = []" "items.append(42)"
20000 loops, best of 3: 0.554 usec per loop
$ python -m timeit -n20000 -s"items = [42]*10**6" "items.append(42)"
20000 loops, best of 3: 0.529 usec per loop

http://wiki.python.org/moin/TimeComplexity

Peter
Jun 30 '08 #2
Le Monday 30 June 2008 09:23:46 Peter Otten, vous avez écrit*:
Ampedesign wrote:
If I happen to have a list that contains over 50,000 items, will the
size of the list severely impact the performance of appending to the
list?

No.

$ python -m timeit -n20000 -s"items = []" "items.append(42)"
20000 loops, best of 3: 0.554 usec per loop
$ python -m timeit -n20000 -s"items = [42]*10**6" "items.append(42)"
20000 loops, best of 3: 0.529 usec per loop
But it surely could, if the box happens to be out of memory and begin to swap,
while it's not, of course, an issue with python lists...

--
_____________

Maric Michaud
Jun 30 '08 #3
Peter Otten wrote:
Ampedesign wrote:
>If I happen to have a list that contains over 50,000 items, will the
size of the list severely impact the performance of appending to the
list?

No.

$ python -m timeit -n20000 -s"items = []" "items.append(42)"
20000 loops, best of 3: 0.554 usec per loop
$ python -m timeit -n20000 -s"items = [42]*10**6" "items.append(42)"
20000 loops, best of 3: 0.529 usec per loop

http://wiki.python.org/moin/TimeComplexity

Peter
Peter,

So its actually faster to append to a long list than an empty one? That
certainly would not have been intuitively obvious now would it?

-Larry
Jun 30 '08 #4
Larry Bates wrote:
Peter Otten wrote:
>Ampedesign wrote:
>>If I happen to have a list that contains over 50,000 items, will the
size of the list severely impact the performance of appending to the
list?

No.

$ python -m timeit -n20000 -s"items = []" "items.append(42)"
20000 loops, best of 3: 0.554 usec per loop
$ python -m timeit -n20000 -s"items = [42]*10**6" "items.append(42)"
20000 loops, best of 3: 0.529 usec per loop

http://wiki.python.org/moin/TimeComplexity

Peter

Peter,

So its actually faster to append to a long list than an empty one? That
certainly would not have been intuitively obvious now would it?
You shouldn't blindly trust the numbers.

Here's what happens if I repeat the measurements a few times:

$ python -m timeit -n20000 -s"items = []" "items.append(42)"
20000 loops, best of 3: 0.531 usec per loop
$ python -m timeit -n20000 -s"items = []" "items.append(42)"
20000 loops, best of 3: 0.511 usec per loop
$ python -m timeit -n20000 -s"items = []" "items.append(42)"
20000 loops, best of 3: 0.512 usec per loop
$ python -m timeit -n20000 -s"items = []" "items.append(42)"
20000 loops, best of 3: 0.51 usec per loop
$ python -m timeit -n20000 -s"items = []" "items.append(42)"
20000 loops, best of 3: 0.514 usec per loop
$ python -m timeit -n20000 -s"items = [42]*10**6" "items.append(42)"
20000 loops, best of 3: 0.506 usec per loop
$ python -m timeit -n20000 -s"items = [42]*10**6" "items.append(42)"
20000 loops, best of 3: 0.512 usec per loop
$ python -m timeit -n20000 -s"items = [42]*10**6" "items.append(42)"
20000 loops, best of 3: 0.543 usec per loop
$ python -m timeit -n20000 -s"items = [42]*10**6" "items.append(42)"
20000 loops, best of 3: 0.522 usec per loop
$ python -m timeit -n20000 -s"items = [42]*10**6" "items.append(42)"
20000 loops, best of 3: 0.51 usec per loop

The difference is within the error margin. All you can say is that both
operations take roughly the same time.

In general, if no error margin (e. g. 0.5+-0.1) is given that is always a
warning sign, be it opinion polls or timeit output.

Peter
Jun 30 '08 #5
Larry Bates wrote:
[...]
So its actually faster to append to a long list than an empty one? That
certainly would not have been intuitively obvious now would it?
Maybe not intuitively, but if you know how dynamically growing data
structures are implemented, it's plausible. They overallocate, and the
amount of overallocation is dependent on the current size. Relevant
source snippet from Python 2.6:

/* This over-allocates proportional to the list size, making room
* for additional growth. The over-allocation is mild, but is
* enough to give linear-time amortized behavior over a long
* sequence of appends() in the presence of a poorly-performing
* system realloc().
* The growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
*/
new_allocated = (newsize >3) + (newsize < 9 ? 3 : 6);

If, on the other hand, we knew beforehand how big the list will get
approximately, we could avoid all these reallocations. No problem with
Python's C API:

PyAPI_FUNC(PyObject *) PyList_New(Py_ssize_t size);

But you can't do it directly from Python, unless you (ab)use ctypes.

-- Gerhard

Jun 30 '08 #6
Le Monday 30 June 2008 15:13:30 Larry Bates, vous avez écrit*:
Peter Otten wrote:
Ampedesign wrote:
If I happen to have a list that contains over 50,000 items, will the
size of the list severely impact the performance of appending to the
list?
No.

$ python -m timeit -n20000 -s"items = []" "items.append(42)"
20000 loops, best of 3: 0.554 usec per loop
$ python -m timeit -n20000 -s"items = [42]*10**6" "items.append(42)"
20000 loops, best of 3: 0.529 usec per loop

http://wiki.python.org/moin/TimeComplexity

Peter

Peter,

So its actually faster to append to a long list than an empty one? That
certainly would not have been intuitively obvious now would it?
That test only demonstrates that it's faster to append to a 1 million items
list than an empty one (and this on a particular platform with a particular
python version). Different sizes may give different result. I guess this is
because of some internal optimisations (items are probably allocated by
chunks, so sometimes append() involves a realloc, sometimes not).

So the only thing you should remember is that list.append() has a complexity
of O(1), and thus should be considered a constant time operation for any
length. Just be aware of the note:

[1] = These operations rely on the "Amortized" part of "Amortized Worst Case".
Individual actions may take surprisingly long, depending on the history of
the container.

Also note that 50000 items is a lot for a human being, not for a modern
computer.

--
Cédric Lucantis
Jun 30 '08 #7
Le Monday 30 June 2008 15:52:56 Gerhard Häring, vous avez écrit*:
Larry Bates wrote:
[...]
So its actually faster to append to a long list than an empty one? That
certainly would not have been intuitively obvious now would it?

Maybe not intuitively, but if you know how dynamically growing data
structures are implemented, it's plausible. They overallocate, and the
amount of overallocation is dependent on the current size. Relevant
source snippet from Python 2.6:

/* This over-allocates proportional to the list size, making room
* for additional growth. The over-allocation is mild, but is
* enough to give linear-time amortized behavior over a long
* sequence of appends() in the presence of a poorly-performing
* system realloc().
* The growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
*/
new_allocated = (newsize >3) + (newsize < 9 ? 3 : 6);

If, on the other hand, we knew beforehand how big the list will get
approximately, we could avoid all these reallocations. No problem with
Python's C API:

PyAPI_FUNC(PyObject *) PyList_New(Py_ssize_t size);

But you can't do it directly from Python, unless you (ab)use ctypes.

-- Gerhard

--
http://mail.python.org/mailman/listinfo/python-list
Well, as I posted few days ago, one could envisage, as a pure python
optimization for dealing with long list, to replace an algorithm with a lot
of append by something like this :

mark = object()

datas = [ mark ] * expected_size

# working with the datas while maintaining the effective currrently used size

Of course one could even subclass list and redefine __len__, append, and some
other methods to deal with this "allocated by block" list.
--
_____________

Maric Michaud
Jun 30 '08 #8


Maric Michaud wrote:
Le Monday 30 June 2008 15:52:56 Gerhard Häring, vous avez écrit :
>Larry Bates wrote:
>If, on the other hand, we knew beforehand how big the list will get
approximately, we could avoid all these reallocations. No problem with
Python's C API:

PyAPI_FUNC(PyObject *) PyList_New(Py_ssize_t size);

But you can't do it directly from Python, unless you (ab)use ctypes.

-- Gerhard

--
http://mail.python.org/mailman/listinfo/python-list

Well, as I posted few days ago, one could envisage, as a pure python
optimization for dealing with long list, to replace an algorithm with a lot
of append by something like this :

mark = object()

datas = [ mark ] * expected_size
datas = [None] * expected_size
has been a standard idiom since before object() existed ;-)
and works fine *unless* one wants to add None explicitly
and have that be different from 'unused'.
>
# working with the datas while maintaining the effective currrently used size

Of course one could even subclass list and redefine __len__, append, and some
other methods to deal with this "allocated by block" list.
An interesting idea if one does this at least a few times and wants to
use .append and .extend instead of explicit indexing.

One could also make such a subclass a 'no-grow' list if appropriate
(when an attempt to grow it would indicate a bug).

tjr

Jun 30 '08 #9
Le Monday 30 June 2008 22:21:35 Terry Reedy, vous avez écrit*:
Well, as I posted few days ago, one could envisage, as a pure python
optimization for dealing with long list, to replace an algorithm with a
lot of append by something like this :

mark = object()

datas = [ mark ] * expected_size

datas = [None] * expected_size
has been a standard idiom since before object() existed ;-)
and works fine *unless* one wants to add None explicitly
and have that be different from 'unused'.

Yes, in fact I used a marker because it I thought of it primarily as outbound
for the list (like \0 for strings in C), but it doesnt' matter what is the
object you put in the list, if you know at every moment its size.

A subclass of list will indeed have to override most of the methods of its
parent (not just some as I assumed before), using extend for reallocation
with some sort of iterator with size, as it work in my previous example with
xrange, something like that :
>>>[31]: class iter_with_len(object) :
def __init__(self, size, obj=None) :
self.size = size
self.obj = obj
def __len__(self) : return self.size
def __iter__(self) : return itertools.repeat(self.obj, len(self))
....:
....:


--
_____________

Maric Michaud
Jun 30 '08 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: Jess Austin | last post by:
hi, I like the way that Python does lists, and I love the way it does iterators. But I've decided I don't like what it does with iterators of lists. Lists are supposed to be mutable sequences,...
9
by: Neuruss | last post by:
I have a doubt regarding list comprehensions: According to Mark Lutz in his book Learning Pyhon: "...there is currently a substantial performance advantage to the extra complexity in this case:...
17
by: Thomas M. | last post by:
Hello, i have a question on the following code. test_list = for i in test_list: print i
10
by: Jason | last post by:
Hi, I have a few million data items being received by my program and I wish to store them in a list, with each item being inserted in any position in the list. Any performance tips so that my...
9
by: Goh, Yong Kwang | last post by:
I'm currently doing a project that has a array that supposed to be determined only at runtime. Originally, the prototype I did as a proof of theory (or a quick hack) So one method is to create a...
4
by: Iain King | last post by:
When I loop over one list I use: for item in items: print item but often I want to loop through two lists at once, and I've been doing this like I would in any other language - creating an...
11
by: ZenRhapsody | last post by:
Has anyone done any performance testing between new generic Lists and single dimensional arrays? I really like the code flexibility the List provides since I don't know how many items I will...
18
by: Sean | last post by:
I have been using List(of String) when I could easily be using a string array instead. Is it still considered best practice to use Generic list of string rather then a string array? Thanks
1
by: Macca | last post by:
Hi, I am considering using a Generic List to hold user defined class entries. Since this List will be accessed by multiple threads I need it to be threadsafe. It appears there is no easy way to...
12
by: Mark S. | last post by:
Hello, The app in question is lives on a Windows 2003 server with .NET 2.0 running IIS 6. The page of the app in question processes 2000 get requests a second during peak loads. The app uses...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.