473,796 Members | 2,645 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Why custom objects take so much memory?

Hi,

after a couple of days of script debugging, I kind of found that some
assumptions I was doing about the memory complexity of my classes are
not true. I decided to do a simple script to isolate the problem:

class MyClass:
def __init__(self,s ):
self.mystring = s

mylist = []
for i in range(1024*1024 ):
mylist.append(M yClass(str(i))) #allocation
#stage 1
mylist = None
gc.collect()
#stage 2

I take measures of the memory consumption of the script at #stage1 and
#stage 2 and I obtain:
#stage1 -238MB
#stage2 -15MB

That means every object is around 223 bytes in size!!!! That's too
much considering it only contains a string with a maximum size of 7
chars.

If you change the allocation line for this other:
>>mylist.append (str(i)) #we don't create the custom class, but append the string directly into the list
the numbers decrease substantially to:
#stage1 -47.6MB
#stage2 -15MB
(so this time we can say string vars occupy around 32 bytes....still a
lot, isn't it?)

So, what's exactly going on behind the scenes? Why is using custom
objects SO expensive? What other ways of creating structures can be
used (cheaper in memory usage)?

Thanks a lot in advance!

Dec 18 '07 #1
19 4647
On Dec 18, 2007 1:26 PM, jsanshef <js*******@gmai l.comwrote:
Hi,

after a couple of days of script debugging, I kind of found that some
assumptions I was doing about the memory complexity of my classes are
not true. I decided to do a simple script to isolate the problem:

class MyClass:
def __init__(self,s ):
self.mystring = s

mylist = []
for i in range(1024*1024 ):
mylist.append(M yClass(str(i))) #allocation
#stage 1
mylist = None
gc.collect()
#stage 2

I take measures of the memory consumption of the script at #stage1 and
#stage 2 and I obtain:
#stage1 -238MB
#stage2 -15MB

That means every object is around 223 bytes in size!!!! That's too
much considering it only contains a string with a maximum size of 7
chars.
Classes are fairly heavyweight - in your case you've got the size of
the PyObject struct, a dictionary for class attributes (which itself
is another pyobject), and the string object (yet another pyobject),
and the actual string data.
If you change the allocation line for this other:
>mylist.append( str(i)) #we don't create the custom class, but append the string directly into the list

the numbers decrease substantially to:
#stage1 -47.6MB
#stage2 -15MB
(so this time we can say string vars occupy around 32 bytes....still a
lot, isn't it?)
string objects don't have dictionaries and are smaller than "regular"
python objects.
So, what's exactly going on behind the scenes? Why is using custom
objects SO expensive? What other ways of creating structures can be
used (cheaper in memory usage)?
If you're worried about per-instance memory costs Python is probably
not the language for your purposes. On the other hand, odds are that
you actually don't need to worry so much.

You can reduce the size of new-style classes (inherit from object) by
quite a bit if you use __slots__ to eliminate the class dictionary.
Dec 18 '07 #2
jsanshef <js*******@gmai l.comwrites:
That means every object is around 223 bytes in size!!!! That's too
much considering it only contains a string with a maximum size of 7
chars.
The list itself consumes 4 MB because it stores 1 million PyObject
pointers. It possibly consumes more due to overallocation, but let's
ignore that.

Each object takes 36 bytes itself: 4 bytes refcount + 4 bytes type ptr
+ 4 bytes dict ptr + 4 bytes weakptr + 12 bytes gc overhead. That's
not counting malloc overhead, which should be low since objects aren't
malloced individually. Each object requires a dict, which consumes
additional 52 bytes of memory (40 bytes for the dict struct plus 12
for gc). That's 88 bytes per object, not counting malloc overhead.

Then there's string allocation: your average string is 6 chars long;
add to that one additional char for the terminating zero. The string
struct takes up 20 bytes + string length, rounded to nearest
alignment. For your average case, that's 27 bytes, rounded (I assume) to 28.
You also allocate 1024*1024 integers which are never freed (they're
kept on a free list), and each of which takes up at least 12 bytes.

All that adds up to 128 bytes per object, dispersed over several
different object types. It doesn't surprise me that Python is eating
200+ MB of memory.
So, what's exactly going on behind the scenes? Why is using custom
objects SO expensive? What other ways of creating structures can be
used (cheaper in memory usage)?

Thanks a lot in advance!
Use a new-style class and set __slots__:

class MyClass(object) :
__slots__ = 'mystring',
def __init__(self, s):
self.mystring = s

That brings down memory consumption to ~80MB, by cutting down the size
of object instance and removing the dict.
Dec 18 '07 #3
In article <ma************ *************** ************@py thon.org>,
Chris Mellon <ar*****@gmail. comwrote:
>
You can reduce the size of new-style classes (inherit from object) by
quite a bit if you use __slots__ to eliminate the class dictionary.
You can also reduce your functionality quite a bit by using __slots__.
Someday I'll have time to write up a proper page about why you shouldn't
use __slots__....
--
Aahz (aa**@pythoncra ft.com) <* http://www.pythoncraft.com/

"Typing is cheap. Thinking is expensive." --Roy Smith
Dec 18 '07 #4
On Dec 18, 4:49 pm, a...@pythoncraf t.com (Aahz) wrote:
In article <mailman.2538.1 198008758.13605 .python-l...@python.org >,

Chris Mellon <arka...@gmail. comwrote:
You can reduce the size of new-style classes (inherit from object) by
quite a bit if you use __slots__ to eliminate the class dictionary.

You can also reduce your functionality quite a bit by using __slots__.
Someday I'll have time to write up a proper page about why you shouldn't
use __slots__....
Shouting absolute commands without full understanding of the situation
is not going to help anyone.

The OP wanted to minimize memory usage, exactly the intended usage of
slots. Without knowing more about the OP's situation, I don't think
your or I or Chris Mellon can be sure it's not right for the OP's
situation.

You're obviously smart and highly expert in Python--I just wish you
would be more constructive with your advice.
Carl Banks
Dec 18 '07 #5
On Tue, 18 Dec 2007 21:13:14 +0100, Hrvoje Niksic wrote:
Each object takes 36 bytes itself: 4 bytes refcount + 4 bytes type ptr +
4 bytes dict ptr + 4 bytes weakptr + 12 bytes gc overhead. That's not
counting malloc overhead, which should be low since objects aren't
malloced individually. Each object requires a dict, which consumes
additional 52 bytes of memory (40 bytes for the dict struct plus 12 for
gc). That's 88 bytes per object, not counting malloc overhead.
And let's not forget that if you're running on a 64-bit system, you can
double the size of every pointer.

Is there a canonical list of how much memory Python objects take up? Or a
canonical algorithm?

Or failing either of those, a good heuristic?

Then there's string allocation: your average string is 6 chars long; add
to that one additional char for the terminating zero.
Are you sure about that? If Python strings are zero terminated, how does
Python deal with this?
>>'a\0string'[1]
'\x00'


--
Steven
Dec 19 '07 #6
Steven D'Aprano wrote:
On Tue, 18 Dec 2007 21:13:14 +0100, Hrvoje Niksic wrote:
>Each object takes 36 bytes itself: 4 bytes refcount + 4 bytes type ptr +
4 bytes dict ptr + 4 bytes weakptr + 12 bytes gc overhead. That's not
counting malloc overhead, which should be low since objects aren't
malloced individually. Each object requires a dict, which consumes
additional 52 bytes of memory (40 bytes for the dict struct plus 12 for
gc). That's 88 bytes per object, not counting malloc overhead.

And let's not forget that if you're running on a 64-bit system, you can
double the size of every pointer.

Is there a canonical list of how much memory Python objects take up? Or a
canonical algorithm?
Here is Martin v. Löwis giving a few pointers (pardon the pun):

http://mail.python.org/pipermail/pyt...ch/135223.html
http://groups.google.de/group/comp.l...e1de5b05?hl=de
Or failing either of those, a good heuristic?
I thought there was a tool that tried to estimate this using hints from the
type, but Googling has availed me not.
>Then there's string allocation: your average string is 6 chars long; add
to that one additional char for the terminating zero.

Are you sure about that?
Yes. Look at Include/stringobject.h:

"""
Type PyStringObject represents a character string. An extra zero byte is
reserved at the end to ensure it is zero-terminated, but a size is
present so strings with null bytes in them can be represented. This
is an immutable object type.
"""
If Python strings are zero terminated, how does
Python deal with this?
>>>'a\0string '[1]
'\x00'
It stores a length separate from the value. The 0-termination is a courtesy to C
APIs that expect 0-terminated strings. It does not define the end of the Python
string, though.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Dec 19 '07 #7
Steven D'Aprano <st****@REMOVE. THIS.cybersourc e.com.auwrites:
On Tue, 18 Dec 2007 21:13:14 +0100, Hrvoje Niksic wrote:
>Each object takes 36 bytes itself: 4 bytes refcount + 4 bytes type ptr +
4 bytes dict ptr + 4 bytes weakptr + 12 bytes gc overhead. That's not
counting malloc overhead, which should be low since objects aren't
malloced individually. Each object requires a dict, which consumes
additional 52 bytes of memory (40 bytes for the dict struct plus 12 for
gc). That's 88 bytes per object, not counting malloc overhead.

And let's not forget that if you're running on a 64-bit system, you
can double the size of every pointer.
And of Py_ssize_t's, longs, ints with padding (placed between two
pointers). Also note the price of 8-byte struct alignment.
Is there a canonical list of how much memory Python objects take up?
Or a canonical algorithm?

Or failing either of those, a good heuristic?
For built-in types, you need to look at the code of each individual
object. For user types, you can approximate by calculations such as
the above.
>Then there's string allocation: your average string is 6 chars
long; add to that one additional char for the terminating zero.

Are you sure about that? If Python strings are zero terminated, how
does Python deal with this?
>>>'a\0string '[1]
'\x00'
Python strings are zero-terminated so the pointer to string's data can
be passed to the various C APIs (this is standard practice, C++
strings do it too.) Python doesn't rely on zero termination to
calculate string length. So len('a\0string' ) will do the right thing,
but the string will internally store 'a\0string\0'.
Dec 19 '07 #8
Thank you all for your useful comments and suggestions!! They're a
great starting point to redesign my script completely ;)
Cheers!
Dec 19 '07 #9
Hrvoje Niksic wrote:
Steven D'Aprano <st****@REMOVE. THIS.cybersourc e.com.auwrites:
>On Tue, 18 Dec 2007 21:13:14 +0100, Hrvoje Niksic wrote:
>>Each object takes 36 bytes itself: 4 bytes refcount + 4 bytes type ptr +
4 bytes dict ptr + 4 bytes weakptr + 12 bytes gc overhead. That's not
counting malloc overhead, which should be low since objects aren't
malloced individually. Each object requires a dict, which consumes
additional 52 bytes of memory (40 bytes for the dict struct plus 12 for
gc). That's 88 bytes per object, not counting malloc overhead.
And let's not forget that if you're running on a 64-bit system, you
can double the size of every pointer.

And of Py_ssize_t's, longs, ints with padding (placed between two
pointers). Also note the price of 8-byte struct alignment.
>Is there a canonical list of how much memory Python objects take up?
Or a canonical algorithm?

Or failing either of those, a good heuristic?

For built-in types, you need to look at the code of each individual
object. For user types, you can approximate by calculations such as
the above.
It would be helpful if there were a
tabulation of the memory cost for
each built-in type.

Colin W.
>
>>Then there's string allocation: your average string is 6 chars
long; add to that one additional char for the terminating zero.
Are you sure about that? If Python strings are zero terminated, how
does Python deal with this?
>>>>'a\0strin g'[1]
'\x00'

Python strings are zero-terminated so the pointer to string's data can
be passed to the various C APIs (this is standard practice, C++
strings do it too.) Python doesn't rely on zero termination to
calculate string length. So len('a\0string' ) will do the right thing,
but the string will internally store 'a\0string\0'.
Dec 20 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
1866
by: Carl Bevil | last post by:
Hello all. If I want to use a custom memory manager to, say, track memory allocations in Python, what's the best way to do this? I seem to remember there being a way in version 1.5 (or so -- been a while since I used it). I didn't see anything in the documentation, but I may have missed it. I've got the Python source code (v2.3.3) embedded in my application (which is in C++), and would like to somehow override Python's default memory...
7
2715
by: Dev | last post by:
Hello, In the following class definition, the ZString destructor is invoked two times. This crashes the code. class ZString { public: ZString(char* p)
6
1631
by: Mel | last post by:
I have a large collection of custom objects, each representing a period in time with each having a start datetime and an end datetime. I frequently need to query this collection to return a subset of the objects that fall completely or partially between two specified dates. The way I'm doing this at the moment is to iterate thru the entire collection on each query and pull out the valid objects, but this is hardly an optimal way to do it....
6
3259
by: Scott Zabolotzky | last post by:
I'm trying to pass a custom object back and forth between forms. This custom object is pulled into the app using an external reference to an assembly DLL that was given to me by a co-worker. A query-string flag is used to indicate to the page whether it should instantiate a new instance of the object or access an existing instance from the calling page. On the both pages I have a property of the page which is an instance of this custom...
3
2245
by: JimGreen | last post by:
We are designing a WinForm application ( three tiered) There is a debate in our group as to whether we should pass datasets or our custom collections from business layer to the user interface layer. The data is stored in memory (fetched from DB only once). The collection is dynamic in the sense that it can change. My personal opinion is that using dataset makes you code take much more memory than a custom collection. Datasets have...
19
4921
by: Jamey Shuemaker | last post by:
I'm in the process of expanding my knowledge and use of Class Modules. I've perused MSDN and this and other sites, and I'm pretty comfortable with my understanding of Class Modules with the exception of custom Collection Classes. Background: I'm developing an A2K .mdb to be deployed as an .mde at my current job-site. It has several custom controls which utilize custom classes to wrap built-in controls, and add additional functionality....
4
2649
by: sreedhar.cs | last post by:
Hi all, In my application,I want to place a vector in a specific location in shared memory.(a user supplied pointer). I understand that the STL allocator mechanism places the data objects within the STL vector in a user specified location in memory.Still the STL container (vector) resides only within the process address space. But I would want my STL container class(the vector skeleton as such) as well as the data objects to be placed in...
1
4593
by: =?Utf-8?B?QW50aG9ueSBRdWVlbg==?= | last post by:
Hello All, I have created a custom "field" object in .Net that implements IOleObject. It inherits from "System.Windows.Forms.Label". It also has several public properties that contain data to be recalled later. I have also provided COM identity by exposing ClassID, InterfaceID, and EventsID. I can drop these objects into a RichTextBox using the IRichEditOle interface. They display correctly, save to RTF, and Reload into the Richtextbox...
0
790
by: ntalkos | last post by:
Hi all. I am developing a custom business object model, consisting of 2 objects, Question and Answer. One Question can have many possible Answers. So, the Answer class has a public property called _Question, which should be a reference to the Question object that it belongs to. How can I achieve that? The Answer objects are created through a DataGridView (AddNewCore method), and I don't want to use any values from the presentation layer (i.e....
0
9685
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10239
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10190
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10019
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9057
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7555
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5447
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5579
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
3
2928
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.