By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
445,677 Members | 1,174 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 445,677 IT Pros & Developers. It's quick & easy.

Looking for info on Python's memory allocation

P: n/a
Can somebody help me please? I've spent a fruitless
hour googling with no luck.

I'm discussing memory allocation techniques with
somebody, and I'm trying to find a quote from -- I
think -- Tim Peters where he discusses the way Python
allocates memory when you append to lists. In basic
terms, he says that every time you try to append to a
list that is already full, Python doubles the size of
the list. This wastes no more than 50% of the memory
needed for that list, but has various advantages -- and
I'm damned if I can remember exactly what those
advantages were.

Can anyone point me in the right direction?

Thanks,
--
Steven.

Oct 11 '05 #1
Share this Question
Share on Google+
10 Replies


P: n/a
http://mail.python.org/pipermail/pyt...er/198141.html

wherein Tim gently corrects my brash guess that Python lists are
pointer-linked.
The example's linearly-constructed list is allocated by doubling storage,
copying & freeing (cf realloc).
The result that the process virtual memory is twice the size of the list,
more or less, with the freed predecessor chunks on the process heap, but not
released to the operating system.
This only SEEMS to use as much memory as pointer-linked elements would do.
Hope this URL helps.

-Bob
"Steven D'Aprano" <st***@REMOVEMEcyber.com.au> wrote in message
news:43**************@REMOVEMEcyber.com.au...
Can somebody help me please? I've spent a fruitless
hour googling with no luck.

I'm discussing memory allocation techniques with
somebody, and I'm trying to find a quote from -- I
think -- Tim Peters where he discusses the way Python
allocates memory when you append to lists. In basic
terms, he says that every time you try to append to a
list that is already full, Python doubles the size of
the list. This wastes no more than 50% of the memory
needed for that list, but has various advantages -- and
I'm damned if I can remember exactly what those
advantages were.

Can anyone point me in the right direction?

Thanks,
--
Steven.


Oct 11 '05 #2

P: n/a
While there may be a discussion somewhere in the archives,
the comments in listobject.c are enlightening too.

/* Ensure ob_item has room for at least newsize elements, and set
* ob_size to newsize. If newsize > ob_size on entry, the content
* of the new slots at exit is undefined heap trash; it's the caller's
* responsiblity to overwrite them with sane values.
* The number of allocated elements may grow, shrink, or stay the same.
* Failure is impossible if newsize <= self.allocated on entry, although
* that partly relies on an assumption that the system realloc() never
* fails when passed a number of bytes <= the number of bytes last
* allocated (the C standard doesn't guarantee this, but it's hard to
* imagine a realloc implementation where it wouldn't be true).
* Note that self->ob_item may change, and even if newsize is less
* than ob_size on entry.
*/
[...]
/* This over-allocates proportional to the list size, making room
* for additional growth. The over-allocation is mild, but is
* enough to give linear-time amortized behavior over a long
* sequence of appends() in the presence of a poorly-performing
* system realloc().
* The growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
*/
"linear-time amortized behavior" (i.e., that list.append() is O(1) on average)
is the important characteristic you're probably looking for.

CVS source for listobject.c can be seen here:
http://cvs.sourceforge.net/viewcvs.p...27&view=markup

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFDSysUJd01MZaTXX0RAlk1AJkB3mVHggpOkT0bmAf7Zm g4jXOtLgCeIVGc
UxQk5/QcAyi7x53FfJhPRD8=
=Ea6T
-----END PGP SIGNATURE-----

Oct 11 '05 #3

P: n/a
Steven D'Aprano enlightened us with:
he says that every time you try to append to a list that is already
full, Python doubles the size of the list. This wastes no more than
50% of the memory needed for that list, but has various advantages
-- and I'm damned if I can remember exactly what those advantages
were.


Allocating extra memory usually requires the following steps:

- Allocate more memory
- Copy all data from old to new memory
- Deallocate old memory

That second step requires N actions if there are N data items in the
list. If you add one 'block' of memory to the list for each item you
add, you waste no memory, but every item added requires N steps to
copy all the old items. That means that after you've added N items,
N**2 steps have been taken. That's very bad for performance.

If, on the other hand, you double the memory every time you run out,
you have to copy much less data, and in the end it turns out you need
roughly N steps to add N items to the list. That's a lot better, isn't
it?

Sybren
--
The problem with the world is stupidity. Not saying there should be a
capital punishment for stupidity, but why don't we just take the
safety labels off of everything and let the problem solve itself?
Frank Zappa
Oct 11 '05 #4

P: n/a
Sybren Stuvel wrote:
Steven D'Aprano enlightened us with:
he says that every time you try to append to a list that is already
full, Python doubles the size of the list. This wastes no more than
<snip> If, on the other hand, you double the memory every time you run out,
you have to copy much less data, and in the end it turns out you need
roughly N steps to add N items to the list. That's a lot better, isn't
it?


This begs a different question along the same lines.

If I have a generator or other iterable producing a vast number of
items, and use it like this:

s = [k for k in iterable]

if I know beforehand how many items iterable would possibly yield, would
a construct like this be faster and "use" less memory?

s = [0] * len(iterable)
for i in xrange(len(iterable)):
s[i] = iterable.next()

--
Lasse Vågsæther Karlsen
http://usinglvkblog.blogspot.com/
mailto:la***@vkarlsen.no
PGP KeyID: 0x2A42A1C2
Oct 11 '05 #5

P: n/a
Lasse Vågsæther Karlsen wrote:
If I have a generator or other iterable producing a vast number of
items, and use it like this:

s = [k for k in iterable]

if I know beforehand how many items iterable would possibly yield, would
a construct like this be faster and "use" less memory?

s = [0] * len(iterable)
for i in xrange(len(iterable)):
s[i] = iterable.next()


You can easily answer the speed aspect of your question using the timeit
module:

~ $ python2.4 -m timeit -s'iterable=range(1000)' '[k for k in iterable]'
10000 loops, best of 3: 111 usec per loop

~ $ python2.4 -m timeit -s'iterable=range(1000)' 's = [0]*len(iterable); it
= iter(iterable)' 'for i in xrange(len(iterable)): s[i] = it.next()'
1000 loops, best of 3: 513 usec per loop

~ $ python2.4 -m timeit -s'iterable=range(1000)' 's = [0]*len(iterable)'
'for i, v in enumerate(iterable): s[i] = v'
1000 loops, best of 3: 269 usec per loop

~ $ python2.4 -m timeit -s'iterable=range(1000)' 'list(iterable)'
100000 loops, best of 3: 7.33 usec per loop

Peter

Oct 11 '05 #6

P: n/a
On Tue, 11 Oct 2005 11:22:39 +0200, Lasse Vågsæther Karlsen wrote:
This begs a different question along the same lines.
Er, no it doesn't. "Begs the question" does _not_ mean "asks the question"
or "suggests the question". It means "assumes the truth of that which
needs to be proven".

http://en.wikipedia.org/wiki/Begging_the_question
http://www.worldwidewords.org/qa/qa-beg1.htm

(Both of these sources are far more forgiving of the modern mis-usage than
I am. Obviously.)

If I have a generator or other iterable producing a vast number of
items, and use it like this:

s = [k for k in iterable]

if I know beforehand how many items iterable would possibly yield, would
a construct like this be faster and "use" less memory?

s = [0] * len(iterable)
for i in xrange(len(iterable)):
s[i] = iterable.next()


Faster? Maybe. Only testing can tell -- but I doubt it. But as for less
memory, look at the two situations.

In the first, you create a list of N objects.

In the second, you end up with the same list of N objects, plus an xrange
object, which may be bigger or smaller than an ordinary list of N
integers, depending on how large N is.

So it won't use *less* memory -- at best, it will use just slightly more.

Is there a way from within Python to find out how much memory a single
object uses, and how much memory Python is using in total?
--
Steven.

Oct 11 '05 #7

P: n/a
Steven D'Aprano <st***@REMOVETHIScyber.com.au> wrote:
...
s = [k for k in iterable]

if I know beforehand how many items iterable would possibly yield, would
a construct like this be faster and "use" less memory?

s = [0] * len(iterable)
for i in xrange(len(iterable)):
s[i] = iterable.next()


Faster? Maybe. Only testing can tell -- but I doubt it. But as for less


Testing, of course, is trivially easy:

Helen:/tmp alex$ python -mtimeit -s'i=xrange(7)' 'L=[x for x in i]'
100000 loops, best of 3: 6.65 usec per loop

Helen:/tmp alex$ python -mtimeit -s'i=xrange(7)' 'L=list(i)'
100000 loops, best of 3: 5.26 usec per loop

So, using list() instead of that silly list comprehension does have an
easily measurable advantage. To check the complicated option...:

Helen:/tmp alex$ python -mtimeit -s'i=xrange(7)' '
s = [0]*7
u = iter(i)
for x in xrange(7):
s[x] = u.next()
'
10000 loops, best of 3: 24.7 usec per loop

So, the "advantage" of all the complications is to slow down execution
by about four times -- net of all the juicy possibilities for confusion
and errors (there is no common Python type on which you can both call
len(...) AND the .next() method, for example -- a combination which
really makes no sense).

*SIMPLICITY* is the keyword here -- list(...) is by far simplest, and
almost five times faster than that baroque gyrations above...
Alex
Oct 11 '05 #8

P: n/a
Alex Martelli wrote:
(there is no common Python type on which you can both call
len(...) AND the .next() method, for example -- a combination
which really makes no sense).

L = [1, 2, 3]
len(L) 3 I = iter(L)
I <listiterator object at 0x0091ABD0> len(I) 3 I.next() 1 len(I) 2 I.next() 2 len(I) 1 I.next() 3 len(I) 0 I.next()

Traceback (most recent call last):
File "<stdin>", line 1, in ?
StopIteration

(it's probably not a good idea to rely on this behaviour...)

</F>

Oct 11 '05 #9

P: n/a
Fredrik Lundh wrote:
I = iter(L)
I
<listiterator object at 0x0091ABD0>
len(I)

3

[...] (it's probably not a good idea to rely on this behaviour...)


I believe this has been classified as a bug in Python 2.4, which
will be undone in Python 2.5.

Regards,
Martin
Oct 12 '05 #10

P: n/a
Steven D'Aprano <st***@REMOVEMEcyber.com.au> writes:
I'm discussing memory allocation techniques with somebody, and I'm trying to
find a quote from -- I think -- Tim Peters where he discusses the way Python
allocates memory when you append to lists. In basic terms, he says that every
time you try to append to a list that is already full, Python doubles the size
of the list. This wastes no more than 50% of the memory needed for that list,
but has various advantages -- and I'm damned if I can remember exactly what
those advantages were.


That method of allocating memory is not used only in Python. The idea is that
if you double the amount of memory always allocated, the interval between
allocations grows exponentially (assuming storage usage grows in linear
manner).

Memory allocation is often very expensive, because the os works often has to
seek for large enough free block to allocate for application. What is even
more expensive is joining of free blocks which happens every now and then.

I guess you could do the same by tripling memory usage every time you need
more memory. This would reduce number of allocations needed even more. But the
problem is you'd waste even more memory - 2/3 actually. So, doubling the size
of chunks is used, and the technique is quite common.

--
# Edvard Majakari Software Engineer
# PGP PUBLIC KEY available Soli Deo Gloria!
You shouldn't verb verbs.
Nov 1 '05 #11

This discussion thread is closed

Replies have been disabled for this discussion.