473,414 Members | 1,663 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,414 software developers and data experts.

Looking for info on Python's memory allocation

Can somebody help me please? I've spent a fruitless
hour googling with no luck.

I'm discussing memory allocation techniques with
somebody, and I'm trying to find a quote from -- I
think -- Tim Peters where he discusses the way Python
allocates memory when you append to lists. In basic
terms, he says that every time you try to append to a
list that is already full, Python doubles the size of
the list. This wastes no more than 50% of the memory
needed for that list, but has various advantages -- and
I'm damned if I can remember exactly what those
advantages were.

Can anyone point me in the right direction?

Thanks,
--
Steven.

Oct 11 '05 #1
10 2210
http://mail.python.org/pipermail/pyt...er/198141.html

wherein Tim gently corrects my brash guess that Python lists are
pointer-linked.
The example's linearly-constructed list is allocated by doubling storage,
copying & freeing (cf realloc).
The result that the process virtual memory is twice the size of the list,
more or less, with the freed predecessor chunks on the process heap, but not
released to the operating system.
This only SEEMS to use as much memory as pointer-linked elements would do.
Hope this URL helps.

-Bob
"Steven D'Aprano" <st***@REMOVEMEcyber.com.au> wrote in message
news:43**************@REMOVEMEcyber.com.au...
Can somebody help me please? I've spent a fruitless
hour googling with no luck.

I'm discussing memory allocation techniques with
somebody, and I'm trying to find a quote from -- I
think -- Tim Peters where he discusses the way Python
allocates memory when you append to lists. In basic
terms, he says that every time you try to append to a
list that is already full, Python doubles the size of
the list. This wastes no more than 50% of the memory
needed for that list, but has various advantages -- and
I'm damned if I can remember exactly what those
advantages were.

Can anyone point me in the right direction?

Thanks,
--
Steven.


Oct 11 '05 #2
While there may be a discussion somewhere in the archives,
the comments in listobject.c are enlightening too.

/* Ensure ob_item has room for at least newsize elements, and set
* ob_size to newsize. If newsize > ob_size on entry, the content
* of the new slots at exit is undefined heap trash; it's the caller's
* responsiblity to overwrite them with sane values.
* The number of allocated elements may grow, shrink, or stay the same.
* Failure is impossible if newsize <= self.allocated on entry, although
* that partly relies on an assumption that the system realloc() never
* fails when passed a number of bytes <= the number of bytes last
* allocated (the C standard doesn't guarantee this, but it's hard to
* imagine a realloc implementation where it wouldn't be true).
* Note that self->ob_item may change, and even if newsize is less
* than ob_size on entry.
*/
[...]
/* This over-allocates proportional to the list size, making room
* for additional growth. The over-allocation is mild, but is
* enough to give linear-time amortized behavior over a long
* sequence of appends() in the presence of a poorly-performing
* system realloc().
* The growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
*/
"linear-time amortized behavior" (i.e., that list.append() is O(1) on average)
is the important characteristic you're probably looking for.

CVS source for listobject.c can be seen here:
http://cvs.sourceforge.net/viewcvs.p...27&view=markup

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFDSysUJd01MZaTXX0RAlk1AJkB3mVHggpOkT0bmAf7Zm g4jXOtLgCeIVGc
UxQk5/QcAyi7x53FfJhPRD8=
=Ea6T
-----END PGP SIGNATURE-----

Oct 11 '05 #3
Steven D'Aprano enlightened us with:
he says that every time you try to append to a list that is already
full, Python doubles the size of the list. This wastes no more than
50% of the memory needed for that list, but has various advantages
-- and I'm damned if I can remember exactly what those advantages
were.


Allocating extra memory usually requires the following steps:

- Allocate more memory
- Copy all data from old to new memory
- Deallocate old memory

That second step requires N actions if there are N data items in the
list. If you add one 'block' of memory to the list for each item you
add, you waste no memory, but every item added requires N steps to
copy all the old items. That means that after you've added N items,
N**2 steps have been taken. That's very bad for performance.

If, on the other hand, you double the memory every time you run out,
you have to copy much less data, and in the end it turns out you need
roughly N steps to add N items to the list. That's a lot better, isn't
it?

Sybren
--
The problem with the world is stupidity. Not saying there should be a
capital punishment for stupidity, but why don't we just take the
safety labels off of everything and let the problem solve itself?
Frank Zappa
Oct 11 '05 #4
Sybren Stuvel wrote:
Steven D'Aprano enlightened us with:
he says that every time you try to append to a list that is already
full, Python doubles the size of the list. This wastes no more than
<snip> If, on the other hand, you double the memory every time you run out,
you have to copy much less data, and in the end it turns out you need
roughly N steps to add N items to the list. That's a lot better, isn't
it?


This begs a different question along the same lines.

If I have a generator or other iterable producing a vast number of
items, and use it like this:

s = [k for k in iterable]

if I know beforehand how many items iterable would possibly yield, would
a construct like this be faster and "use" less memory?

s = [0] * len(iterable)
for i in xrange(len(iterable)):
s[i] = iterable.next()

--
Lasse Vågsæther Karlsen
http://usinglvkblog.blogspot.com/
mailto:la***@vkarlsen.no
PGP KeyID: 0x2A42A1C2
Oct 11 '05 #5
Lasse Vågsæther Karlsen wrote:
If I have a generator or other iterable producing a vast number of
items, and use it like this:

s = [k for k in iterable]

if I know beforehand how many items iterable would possibly yield, would
a construct like this be faster and "use" less memory?

s = [0] * len(iterable)
for i in xrange(len(iterable)):
s[i] = iterable.next()


You can easily answer the speed aspect of your question using the timeit
module:

~ $ python2.4 -m timeit -s'iterable=range(1000)' '[k for k in iterable]'
10000 loops, best of 3: 111 usec per loop

~ $ python2.4 -m timeit -s'iterable=range(1000)' 's = [0]*len(iterable); it
= iter(iterable)' 'for i in xrange(len(iterable)): s[i] = it.next()'
1000 loops, best of 3: 513 usec per loop

~ $ python2.4 -m timeit -s'iterable=range(1000)' 's = [0]*len(iterable)'
'for i, v in enumerate(iterable): s[i] = v'
1000 loops, best of 3: 269 usec per loop

~ $ python2.4 -m timeit -s'iterable=range(1000)' 'list(iterable)'
100000 loops, best of 3: 7.33 usec per loop

Peter

Oct 11 '05 #6
On Tue, 11 Oct 2005 11:22:39 +0200, Lasse Vågsæther Karlsen wrote:
This begs a different question along the same lines.
Er, no it doesn't. "Begs the question" does _not_ mean "asks the question"
or "suggests the question". It means "assumes the truth of that which
needs to be proven".

http://en.wikipedia.org/wiki/Begging_the_question
http://www.worldwidewords.org/qa/qa-beg1.htm

(Both of these sources are far more forgiving of the modern mis-usage than
I am. Obviously.)

If I have a generator or other iterable producing a vast number of
items, and use it like this:

s = [k for k in iterable]

if I know beforehand how many items iterable would possibly yield, would
a construct like this be faster and "use" less memory?

s = [0] * len(iterable)
for i in xrange(len(iterable)):
s[i] = iterable.next()


Faster? Maybe. Only testing can tell -- but I doubt it. But as for less
memory, look at the two situations.

In the first, you create a list of N objects.

In the second, you end up with the same list of N objects, plus an xrange
object, which may be bigger or smaller than an ordinary list of N
integers, depending on how large N is.

So it won't use *less* memory -- at best, it will use just slightly more.

Is there a way from within Python to find out how much memory a single
object uses, and how much memory Python is using in total?
--
Steven.

Oct 11 '05 #7
Steven D'Aprano <st***@REMOVETHIScyber.com.au> wrote:
...
s = [k for k in iterable]

if I know beforehand how many items iterable would possibly yield, would
a construct like this be faster and "use" less memory?

s = [0] * len(iterable)
for i in xrange(len(iterable)):
s[i] = iterable.next()


Faster? Maybe. Only testing can tell -- but I doubt it. But as for less


Testing, of course, is trivially easy:

Helen:/tmp alex$ python -mtimeit -s'i=xrange(7)' 'L=[x for x in i]'
100000 loops, best of 3: 6.65 usec per loop

Helen:/tmp alex$ python -mtimeit -s'i=xrange(7)' 'L=list(i)'
100000 loops, best of 3: 5.26 usec per loop

So, using list() instead of that silly list comprehension does have an
easily measurable advantage. To check the complicated option...:

Helen:/tmp alex$ python -mtimeit -s'i=xrange(7)' '
s = [0]*7
u = iter(i)
for x in xrange(7):
s[x] = u.next()
'
10000 loops, best of 3: 24.7 usec per loop

So, the "advantage" of all the complications is to slow down execution
by about four times -- net of all the juicy possibilities for confusion
and errors (there is no common Python type on which you can both call
len(...) AND the .next() method, for example -- a combination which
really makes no sense).

*SIMPLICITY* is the keyword here -- list(...) is by far simplest, and
almost five times faster than that baroque gyrations above...
Alex
Oct 11 '05 #8
Alex Martelli wrote:
(there is no common Python type on which you can both call
len(...) AND the .next() method, for example -- a combination
which really makes no sense).

L = [1, 2, 3]
len(L) 3 I = iter(L)
I <listiterator object at 0x0091ABD0> len(I) 3 I.next() 1 len(I) 2 I.next() 2 len(I) 1 I.next() 3 len(I) 0 I.next()

Traceback (most recent call last):
File "<stdin>", line 1, in ?
StopIteration

(it's probably not a good idea to rely on this behaviour...)

</F>

Oct 11 '05 #9
Fredrik Lundh wrote:
I = iter(L)
I
<listiterator object at 0x0091ABD0>
len(I)

3

[...] (it's probably not a good idea to rely on this behaviour...)


I believe this has been classified as a bug in Python 2.4, which
will be undone in Python 2.5.

Regards,
Martin
Oct 12 '05 #10
Steven D'Aprano <st***@REMOVEMEcyber.com.au> writes:
I'm discussing memory allocation techniques with somebody, and I'm trying to
find a quote from -- I think -- Tim Peters where he discusses the way Python
allocates memory when you append to lists. In basic terms, he says that every
time you try to append to a list that is already full, Python doubles the size
of the list. This wastes no more than 50% of the memory needed for that list,
but has various advantages -- and I'm damned if I can remember exactly what
those advantages were.


That method of allocating memory is not used only in Python. The idea is that
if you double the amount of memory always allocated, the interval between
allocations grows exponentially (assuming storage usage grows in linear
manner).

Memory allocation is often very expensive, because the os works often has to
seek for large enough free block to allocate for application. What is even
more expensive is joining of free blocks which happens every now and then.

I guess you could do the same by tripling memory usage every time you need
more memory. This would reduce number of allocations needed even more. But the
problem is you'd waste even more memory - 2/3 actually. So, doubling the size
of chunks is used, and the technique is quite common.

--
# Edvard Majakari Software Engineer
# PGP PUBLIC KEY available Soli Deo Gloria!
You shouldn't verb verbs.
Nov 1 '05 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

699
by: mike420 | last post by:
I think everyone who used Python will agree that its syntax is the best thing going for it. It is very readable and easy for everyone to learn. But, Python does not a have very good macro...
4
by: Moosebumps | last post by:
When I have time, I am planning to evaluate Python for console game development (on Playstation 2, GameCube, and Xbox). Does anyone have any experience with this? Pretty much the only resource...
1
by: Simon Wittber | last post by:
I have written some software which proxy's SQL Server database services across a network. It uses Pyro, without multiuthreading. It creates and closes a new connection and cursor object for each...
68
by: Lad | last post by:
Is anyone capable of providing Python advantages over PHP if there are any? Cheers, L.
53
by: john67 | last post by:
The company I work for is about to embark on developing a commercial application that will cost us tens-of-millions to develop. When all is said and done it will have thousands of business...
137
by: Philippe C. Martin | last post by:
I apologize in advance for launching this post but I might get enlightment somehow (PS: I am _very_ agnostic ;-). - 1) I do not consider my intelligence/education above average - 2) I am very...
83
by: Licheng Fang | last post by:
Hi, I'm learning STL and I wrote some simple code to compare the efficiency of python and STL. //C++ #include <iostream> #include <string> #include <vector> #include <set> #include...
13
by: placid | last post by:
Hi All, Just wondering when i run the following code; for i in range(1000000): print i the memory usage of Python spikes and when the range(..) block finishes execution the memory usage...
0
by: Calvin Spealman | last post by:
On Jun 17, 2008, at 2:34 PM, Eduardo Henrique Tessarioli wrote: The runtime itself isn't as light as, say, the C stdlib. Now, there are a lot of shared libraries here, so you should measure...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.