473,805 Members | 1,905 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Why custom objects take so much memory?

Hi,

after a couple of days of script debugging, I kind of found that some
assumptions I was doing about the memory complexity of my classes are
not true. I decided to do a simple script to isolate the problem:

class MyClass:
def __init__(self,s ):
self.mystring = s

mylist = []
for i in range(1024*1024 ):
mylist.append(M yClass(str(i))) #allocation
#stage 1
mylist = None
gc.collect()
#stage 2

I take measures of the memory consumption of the script at #stage1 and
#stage 2 and I obtain:
#stage1 -238MB
#stage2 -15MB

That means every object is around 223 bytes in size!!!! That's too
much considering it only contains a string with a maximum size of 7
chars.

If you change the allocation line for this other:
>>mylist.append (str(i)) #we don't create the custom class, but append the string directly into the list
the numbers decrease substantially to:
#stage1 -47.6MB
#stage2 -15MB
(so this time we can say string vars occupy around 32 bytes....still a
lot, isn't it?)

So, what's exactly going on behind the scenes? Why is using custom
objects SO expensive? What other ways of creating structures can be
used (cheaper in memory usage)?

Thanks a lot in advance!

Dec 18 '07
19 4653
Hrvoje Niksic wrote:
Steven D'Aprano <st****@REMOVE. THIS.cybersourc e.com.auwrites:
>On Tue, 18 Dec 2007 21:13:14 +0100, Hrvoje Niksic wrote:
>>Each object takes 36 bytes itself: 4 bytes refcount + 4 bytes type ptr +
4 bytes dict ptr + 4 bytes weakptr + 12 bytes gc overhead. That's not
counting malloc overhead, which should be low since objects aren't
malloced individually. Each object requires a dict, which consumes
additional 52 bytes of memory (40 bytes for the dict struct plus 12 for
gc). That's 88 bytes per object, not counting malloc overhead.
And let's not forget that if you're running on a 64-bit system, you
can double the size of every pointer.

And of Py_ssize_t's, longs, ints with padding (placed between two
pointers). Also note the price of 8-byte struct alignment.
>Is there a canonical list of how much memory Python objects take up?
Or a canonical algorithm?

Or failing either of those, a good heuristic?

For built-in types, you need to look at the code of each individual
object. For user types, you can approximate by calculations such as
the above.
It would be helpful if there were a
tabulation of the memory cost for
each built-in type.

Colin W.
>
>>Then there's string allocation: your average string is 6 chars
long; add to that one additional char for the terminating zero.
Are you sure about that? If Python strings are zero terminated, how
does Python deal with this?
>>>>'a\0strin g'[1]
'\x00'

Python strings are zero-terminated so the pointer to string's data can
be passed to the various C APIs (this is standard practice, C++
strings do it too.) Python doesn't rely on zero termination to
calculate string length. So len('a\0string' ) will do the right thing,
but the string will internally store 'a\0string\0'.
Dec 20 '07 #11
In article <08************ *************** *******@l32g200 0hse.googlegrou ps.com>,
Carl Banks <pa************ @gmail.comwrote :
>On Dec 18, 4:49 pm, a...@pythoncraf t.com (Aahz) wrote:
>In article <mailman.2538.1 198008758.13605 .python-l...@python.org >,
Chris Mellon <arka...@gmail. comwrote:
>>>
You can reduce the size of new-style classes (inherit from object) by
quite a bit if you use __slots__ to eliminate the class dictionary.

You can also reduce your functionality quite a bit by using __slots__.
Someday I'll have time to write up a proper page about why you shouldn't
use __slots__....

Shouting absolute commands without full understanding of the situation
is not going to help anyone.
Maybe not, but at least it will get people to stop for a bit.
>The OP wanted to minimize memory usage, exactly the intended usage of
slots. Without knowing more about the OP's situation, I don't think
your or I or Chris Mellon can be sure it's not right for the OP's
situation.
The whole point about warning against __slots__ is that you should never
use them unless you are certain they're the best solution. Consider what
happens when the OP wants to subclass this __slots__-using class.
Avoiding __slots__ will almost never harm anyone, so I feel completely
comfortable sticking with a blanket warning.
--
Aahz (aa**@pythoncra ft.com) <* http://www.pythoncraft.com/

"Typing is cheap. Thinking is expensive." --Roy Smith
Dec 21 '07 #12
On 20 Dec 2007 19:50:31 -0800, Aahz <aa**@pythoncra ft.comwrote:
In article <08************ *************** *******@l32g200 0hse.googlegrou ps.com>,
Carl Banks <pa************ @gmail.comwrote :
On Dec 18, 4:49 pm, a...@pythoncraf t.com (Aahz) wrote:
In article <mailman.2538.1 198008758.13605 .python-l...@python.org >,
Chris Mellon <arka...@gmail. comwrote:

You can reduce the size of new-style classes (inherit from object) by
quite a bit if you use __slots__ to eliminate the class dictionary.

You can also reduce your functionality quite a bit by using __slots__.
Someday I'll have time to write up a proper page about why you shouldn't
use __slots__....
Shouting absolute commands without full understanding of the situation
is not going to help anyone.

Maybe not, but at least it will get people to stop for a bit.
No, it will just make people stop ignoring you because you give
inappropriate advice forbidding reasonable solutions without a
rational justification.
The OP wanted to minimize memory usage, exactly the intended usage of
slots. Without knowing more about the OP's situation, I don't think
your or I or Chris Mellon can be sure it's not right for the OP's
situation.

The whole point about warning against __slots__ is that you should never
use them unless you are certain they're the best solution. Consider what
happens when the OP wants to subclass this __slots__-using class.
Nothing. Subclasses of a class with __slots__ get a dict just like
anything else, unless they also define slots. Why do you think you can
subclass object to get something you can stick arbitrary attributes
on?
Avoiding __slots__ will almost never harm anyone, so I feel completely
comfortable sticking with a blanket warning.
--
Barking out your blanket warning in a thread on *the exact use case
slots were implemented to address* just makes you look like a mindless
reactionary. Stick to posting your warning in threads where people ask
how to stop "other people" from setting attributes on "their"
instances.
Dec 21 '07 #13
Chris Mellon wrote:
On 20 Dec 2007 19:50:31 -0800, Aahz <aa**@pythoncra ft.comwrote:
>In article <08************ *************** *******@l32g200 0hse.googlegrou ps.com>,
Carl Banks <pa************ @gmail.comwrote :
>>>Someday I'll have time to write up a proper page about why you shouldn't
use __slots__....
Barking out your blanket warning in a thread on *the exact use case
slots were implemented to address* just makes you look like a mindless
reactionary. Stick to posting your warning in threads where people ask
how to stop "other people" from setting attributes on "their"
instances.
Agreed.

I'd like to hear more about what kind of performance gain can be
obtained from "__slots__" . I'm looking into ways of speeding up
HTML parsing via BeautifulSoup. If a significant speedup can be
obtained when navigating large trees of small objects, that's worth
quite a bit to me.

John Nagle
SiteTruth
Dec 21 '07 #14
John Nagle wrote:
I'd like to hear more about what kind of performance gain can be
obtained from "__slots__" . I'm looking into ways of speeding up
HTML parsing via BeautifulSoup. If a significant speedup can be
obtained when navigating large trees of small objects, that's worth
quite a bit to me.
The following micro-benchmarks are from Python 2.5 on a Core Duo
machine. C0 is an old-style class, C1 is a new-style class, C2 is a
new-style class using __slots__:

# read access
$ timeit -s "import q; o = q.C0(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.133 usec per loop
$ timeit -s "import q; o = q.C1(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.184 usec per loop
$ timeit -s "import q; o = q.C2(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.161 usec per loop

# write access
$ timeit -s "import q; o = q.C0(); o.attrib = 1" "o.attrib = 1"
10000000 loops, best of 3: 0.15 usec per loop
$ timeit -s "import q; o = q.C1(); o.attrib = 1" "o.attrib = 1"
1000000 loops, best of 3: 0.217 usec per loop
$ timeit -s "import q; o = q.C2(); o.attrib = 1" "o.attrib = 1"
1000000 loops, best of 3: 0.209 usec per loop

$ more q.py
class C0:
pass

class C1(object):
pass

class C2(object):
__slots__ = ["attrib"]

Your mileage may vary.
I'm looking into ways of speeding up HTML parsing via BeautifulSoup.
The solution to that is spelled "lxml".

</F>

Dec 21 '07 #15
My milage does vary, see this older post

<http://mail.python.org/pipermail/python-list/2004-May/261985.html>

Similar figures are shown with Python 2.5, both for 32- and 64-bit.

/Jean Brouwers
On Dec 21, 12:07*pm, Fredrik Lundh <fred...@python ware.comwrote:
John Nagle wrote:
* * *I'd like to hear more about what kind of performance gain canbe
obtained from "__slots__" . *I'm looking into ways of speeding up
HTML parsing via BeautifulSoup. *If a significant speedup can be
obtained when navigating large trees of small objects, that's worth
quite a bit to me.

The following micro-benchmarks are from Python 2.5 on a Core Duo
machine. *C0 is an old-style class, C1 is a new-style class, C2 is a
new-style class using __slots__:

# read access
$ timeit -s "import q; o = q.C0(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.133 usec per loop
$ timeit -s "import q; o = q.C1(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.184 usec per loop
$ timeit -s "import q; o = q.C2(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.161 usec per loop

# write access
$ timeit -s "import q; o = q.C0(); o.attrib = 1" "o.attrib = 1"
10000000 loops, best of 3: 0.15 usec per loop
$ timeit -s "import q; o = q.C1(); o.attrib = 1" "o.attrib = 1"
1000000 loops, best of 3: 0.217 usec per loop
$ timeit -s "import q; o = q.C2(); o.attrib = 1" "o.attrib = 1"
1000000 loops, best of 3: 0.209 usec per loop

$ more q.py
class C0:
* * *pass

class C1(object):
* * *pass

class C2(object):
* * *__slots__ = ["attrib"]

Your mileage may vary.

*I'm looking into ways of speeding up HTML parsing via BeautifulSoup.

The solution to that is spelled "lxml".

</F>
Dec 21 '07 #16
MrJean1 wrote:
My milage does vary, see this older post

<http://mail.python.org/pipermail/python-list/2004-May/261985.html>

Similar figures are shown with Python 2.5, both for 32- and 64-bit.
unless I'm missing something, you're measuring object creation time.

I'm measuring attribute access time (the topic was "navigating large
trees of small objects", not building them).

</F>

Dec 21 '07 #17
On Dec 21, 2007 2:07 PM, Fredrik Lundh <fr*****@python ware.comwrote:
John Nagle wrote:
I'd like to hear more about what kind of performance gain can be
obtained from "__slots__" . I'm looking into ways of speeding up
HTML parsing via BeautifulSoup. If a significant speedup can be
obtained when navigating large trees of small objects, that's worth
quite a bit to me.

The following micro-benchmarks are from Python 2.5 on a Core Duo
machine. C0 is an old-style class, C1 is a new-style class, C2 is a
new-style class using __slots__:

# read access
$ timeit -s "import q; o = q.C0(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.133 usec per loop
$ timeit -s "import q; o = q.C1(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.184 usec per loop
$ timeit -s "import q; o = q.C2(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.161 usec per loop

# write access
$ timeit -s "import q; o = q.C0(); o.attrib = 1" "o.attrib = 1"
10000000 loops, best of 3: 0.15 usec per loop
$ timeit -s "import q; o = q.C1(); o.attrib = 1" "o.attrib = 1"
1000000 loops, best of 3: 0.217 usec per loop
$ timeit -s "import q; o = q.C2(); o.attrib = 1" "o.attrib = 1"
1000000 loops, best of 3: 0.209 usec per loop

$ more q.py
class C0:
pass

class C1(object):
pass

class C2(object):
__slots__ = ["attrib"]

Your mileage may vary.

Here are my timings of object creation time. Note that as you add
slots, the speed advantage disappears.

C:\>python -m timeit -s "from objs import C0 as c" "c()"
1000000 loops, best of 3: 0.298 usec per loop

C:\>python -m timeit -s "from objs import C1 as c" "c()"
10000000 loops, best of 3: 0.172 usec per loop

C:\>python -m timeit -s "from objs import C2 as c" "c()"
10000000 loops, best of 3: 0.164 usec per loop

C:\>python -m timeit -s "from objs import C3 as c" "c()"
1000000 loops, best of 3: 0.302 usec per loop

#objs.py
import string

class C0:
pass

class C1(object):
pass

class C2(object):
__slots__ = ["foo"]

class C3(object):
__slots__ = list(string.asc ii_letters)
Dec 21 '07 #18
You are correct. Mea culpa.

/Jean Brouwers

On Dec 21, 1:41*pm, Fredrik Lundh <fred...@python ware.comwrote:
MrJean1 wrote:
My milage does vary, see this older post
* <http://mail.python.org/pipermail/python-list/2004-May/261985.html>
Similar figures are shown with Python 2.5, both for 32- and 64-bit.

unless I'm missing something, you're measuring object creation time.

I'm measuring attribute access time (the topic was "navigating large
trees of small objects", not building them).

</F>
Dec 21 '07 #19
Fredrik Lundh wrote:
John Nagle wrote:
> I'd like to hear more about what kind of performance gain can be
obtained from "__slots__" . I'm looking into ways of speeding up
HTML parsing via BeautifulSoup. If a significant speedup can be
obtained when navigating large trees of small objects, that's worth
quite a bit to me.

The following micro-benchmarks are from Python 2.5 on a Core Duo
machine. C0 is an old-style class, C1 is a new-style class, C2 is a
new-style class using __slots__:

# read access
$ timeit -s "import q; o = q.C0(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.133 usec per loop
$ timeit -s "import q; o = q.C1(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.184 usec per loop
$ timeit -s "import q; o = q.C2(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.161 usec per loop

# write access
$ timeit -s "import q; o = q.C0(); o.attrib = 1" "o.attrib = 1"
10000000 loops, best of 3: 0.15 usec per loop
$ timeit -s "import q; o = q.C1(); o.attrib = 1" "o.attrib = 1"
1000000 loops, best of 3: 0.217 usec per loop
$ timeit -s "import q; o = q.C2(); o.attrib = 1" "o.attrib = 1"
1000000 loops, best of 3: 0.209 usec per loop
Not much of a win there. Thanks.
>
I'm looking into ways of speeding up HTML parsing via BeautifulSoup.

The solution to that is spelled "lxml".
I may eventually have to go to a non-Python solution.
But I've finally made enough robustness fixes to BeautifulSoup that
it's usable on large numbers of real-world web sites. (Only two
exceptions in the last 100,000 web sites processed. If you want
to exercise your HTML parser on hard cases, run hostile-code web
sites through it.)

John Nagle
Dec 23 '07 #20

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
1867
by: Carl Bevil | last post by:
Hello all. If I want to use a custom memory manager to, say, track memory allocations in Python, what's the best way to do this? I seem to remember there being a way in version 1.5 (or so -- been a while since I used it). I didn't see anything in the documentation, but I may have missed it. I've got the Python source code (v2.3.3) embedded in my application (which is in C++), and would like to somehow override Python's default memory...
7
2715
by: Dev | last post by:
Hello, In the following class definition, the ZString destructor is invoked two times. This crashes the code. class ZString { public: ZString(char* p)
6
1631
by: Mel | last post by:
I have a large collection of custom objects, each representing a period in time with each having a start datetime and an end datetime. I frequently need to query this collection to return a subset of the objects that fall completely or partially between two specified dates. The way I'm doing this at the moment is to iterate thru the entire collection on each query and pull out the valid objects, but this is hardly an optimal way to do it....
6
3262
by: Scott Zabolotzky | last post by:
I'm trying to pass a custom object back and forth between forms. This custom object is pulled into the app using an external reference to an assembly DLL that was given to me by a co-worker. A query-string flag is used to indicate to the page whether it should instantiate a new instance of the object or access an existing instance from the calling page. On the both pages I have a property of the page which is an instance of this custom...
3
2245
by: JimGreen | last post by:
We are designing a WinForm application ( three tiered) There is a debate in our group as to whether we should pass datasets or our custom collections from business layer to the user interface layer. The data is stored in memory (fetched from DB only once). The collection is dynamic in the sense that it can change. My personal opinion is that using dataset makes you code take much more memory than a custom collection. Datasets have...
19
4923
by: Jamey Shuemaker | last post by:
I'm in the process of expanding my knowledge and use of Class Modules. I've perused MSDN and this and other sites, and I'm pretty comfortable with my understanding of Class Modules with the exception of custom Collection Classes. Background: I'm developing an A2K .mdb to be deployed as an .mde at my current job-site. It has several custom controls which utilize custom classes to wrap built-in controls, and add additional functionality....
4
2651
by: sreedhar.cs | last post by:
Hi all, In my application,I want to place a vector in a specific location in shared memory.(a user supplied pointer). I understand that the STL allocator mechanism places the data objects within the STL vector in a user specified location in memory.Still the STL container (vector) resides only within the process address space. But I would want my STL container class(the vector skeleton as such) as well as the data objects to be placed in...
1
4593
by: =?Utf-8?B?QW50aG9ueSBRdWVlbg==?= | last post by:
Hello All, I have created a custom "field" object in .Net that implements IOleObject. It inherits from "System.Windows.Forms.Label". It also has several public properties that contain data to be recalled later. I have also provided COM identity by exposing ClassID, InterfaceID, and EventsID. I can drop these objects into a RichTextBox using the IRichEditOle interface. They display correctly, save to RTF, and Reload into the Richtextbox...
0
791
by: ntalkos | last post by:
Hi all. I am developing a custom business object model, consisting of 2 objects, Question and Answer. One Question can have many possible Answers. So, the Answer class has a public property called _Question, which should be a reference to the Question object that it belongs to. How can I achieve that? The Answer objects are created through a DataGridView (AddNewCore method), and I don't want to use any values from the presentation layer (i.e....
0
9716
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10609
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10105
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7646
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6876
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5542
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
4323
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3845
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3007
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.