472,989 Members | 3,025 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,989 software developers and data experts.

Why custom objects take so much memory?

Hi,

after a couple of days of script debugging, I kind of found that some
assumptions I was doing about the memory complexity of my classes are
not true. I decided to do a simple script to isolate the problem:

class MyClass:
def __init__(self,s):
self.mystring = s

mylist = []
for i in range(1024*1024):
mylist.append(MyClass(str(i))) #allocation
#stage 1
mylist = None
gc.collect()
#stage 2

I take measures of the memory consumption of the script at #stage1 and
#stage 2 and I obtain:
#stage1 -238MB
#stage2 -15MB

That means every object is around 223 bytes in size!!!! That's too
much considering it only contains a string with a maximum size of 7
chars.

If you change the allocation line for this other:
>>mylist.append(str(i)) #we don't create the custom class, but append the string directly into the list
the numbers decrease substantially to:
#stage1 -47.6MB
#stage2 -15MB
(so this time we can say string vars occupy around 32 bytes....still a
lot, isn't it?)

So, what's exactly going on behind the scenes? Why is using custom
objects SO expensive? What other ways of creating structures can be
used (cheaper in memory usage)?

Thanks a lot in advance!

Dec 18 '07 #1
19 4544
On Dec 18, 2007 1:26 PM, jsanshef <js*******@gmail.comwrote:
Hi,

after a couple of days of script debugging, I kind of found that some
assumptions I was doing about the memory complexity of my classes are
not true. I decided to do a simple script to isolate the problem:

class MyClass:
def __init__(self,s):
self.mystring = s

mylist = []
for i in range(1024*1024):
mylist.append(MyClass(str(i))) #allocation
#stage 1
mylist = None
gc.collect()
#stage 2

I take measures of the memory consumption of the script at #stage1 and
#stage 2 and I obtain:
#stage1 -238MB
#stage2 -15MB

That means every object is around 223 bytes in size!!!! That's too
much considering it only contains a string with a maximum size of 7
chars.
Classes are fairly heavyweight - in your case you've got the size of
the PyObject struct, a dictionary for class attributes (which itself
is another pyobject), and the string object (yet another pyobject),
and the actual string data.
If you change the allocation line for this other:
>mylist.append(str(i)) #we don't create the custom class, but append the string directly into the list

the numbers decrease substantially to:
#stage1 -47.6MB
#stage2 -15MB
(so this time we can say string vars occupy around 32 bytes....still a
lot, isn't it?)
string objects don't have dictionaries and are smaller than "regular"
python objects.
So, what's exactly going on behind the scenes? Why is using custom
objects SO expensive? What other ways of creating structures can be
used (cheaper in memory usage)?
If you're worried about per-instance memory costs Python is probably
not the language for your purposes. On the other hand, odds are that
you actually don't need to worry so much.

You can reduce the size of new-style classes (inherit from object) by
quite a bit if you use __slots__ to eliminate the class dictionary.
Dec 18 '07 #2
jsanshef <js*******@gmail.comwrites:
That means every object is around 223 bytes in size!!!! That's too
much considering it only contains a string with a maximum size of 7
chars.
The list itself consumes 4 MB because it stores 1 million PyObject
pointers. It possibly consumes more due to overallocation, but let's
ignore that.

Each object takes 36 bytes itself: 4 bytes refcount + 4 bytes type ptr
+ 4 bytes dict ptr + 4 bytes weakptr + 12 bytes gc overhead. That's
not counting malloc overhead, which should be low since objects aren't
malloced individually. Each object requires a dict, which consumes
additional 52 bytes of memory (40 bytes for the dict struct plus 12
for gc). That's 88 bytes per object, not counting malloc overhead.

Then there's string allocation: your average string is 6 chars long;
add to that one additional char for the terminating zero. The string
struct takes up 20 bytes + string length, rounded to nearest
alignment. For your average case, that's 27 bytes, rounded (I assume) to 28.
You also allocate 1024*1024 integers which are never freed (they're
kept on a free list), and each of which takes up at least 12 bytes.

All that adds up to 128 bytes per object, dispersed over several
different object types. It doesn't surprise me that Python is eating
200+ MB of memory.
So, what's exactly going on behind the scenes? Why is using custom
objects SO expensive? What other ways of creating structures can be
used (cheaper in memory usage)?

Thanks a lot in advance!
Use a new-style class and set __slots__:

class MyClass(object):
__slots__ = 'mystring',
def __init__(self, s):
self.mystring = s

That brings down memory consumption to ~80MB, by cutting down the size
of object instance and removing the dict.
Dec 18 '07 #3
In article <ma***************************************@python. org>,
Chris Mellon <ar*****@gmail.comwrote:
>
You can reduce the size of new-style classes (inherit from object) by
quite a bit if you use __slots__ to eliminate the class dictionary.
You can also reduce your functionality quite a bit by using __slots__.
Someday I'll have time to write up a proper page about why you shouldn't
use __slots__....
--
Aahz (aa**@pythoncraft.com) <* http://www.pythoncraft.com/

"Typing is cheap. Thinking is expensive." --Roy Smith
Dec 18 '07 #4
On Dec 18, 4:49 pm, a...@pythoncraft.com (Aahz) wrote:
In article <mailman.2538.1198008758.13605.python-l...@python.org>,

Chris Mellon <arka...@gmail.comwrote:
You can reduce the size of new-style classes (inherit from object) by
quite a bit if you use __slots__ to eliminate the class dictionary.

You can also reduce your functionality quite a bit by using __slots__.
Someday I'll have time to write up a proper page about why you shouldn't
use __slots__....
Shouting absolute commands without full understanding of the situation
is not going to help anyone.

The OP wanted to minimize memory usage, exactly the intended usage of
slots. Without knowing more about the OP's situation, I don't think
your or I or Chris Mellon can be sure it's not right for the OP's
situation.

You're obviously smart and highly expert in Python--I just wish you
would be more constructive with your advice.
Carl Banks
Dec 18 '07 #5
On Tue, 18 Dec 2007 21:13:14 +0100, Hrvoje Niksic wrote:
Each object takes 36 bytes itself: 4 bytes refcount + 4 bytes type ptr +
4 bytes dict ptr + 4 bytes weakptr + 12 bytes gc overhead. That's not
counting malloc overhead, which should be low since objects aren't
malloced individually. Each object requires a dict, which consumes
additional 52 bytes of memory (40 bytes for the dict struct plus 12 for
gc). That's 88 bytes per object, not counting malloc overhead.
And let's not forget that if you're running on a 64-bit system, you can
double the size of every pointer.

Is there a canonical list of how much memory Python objects take up? Or a
canonical algorithm?

Or failing either of those, a good heuristic?

Then there's string allocation: your average string is 6 chars long; add
to that one additional char for the terminating zero.
Are you sure about that? If Python strings are zero terminated, how does
Python deal with this?
>>'a\0string'[1]
'\x00'


--
Steven
Dec 19 '07 #6
Steven D'Aprano wrote:
On Tue, 18 Dec 2007 21:13:14 +0100, Hrvoje Niksic wrote:
>Each object takes 36 bytes itself: 4 bytes refcount + 4 bytes type ptr +
4 bytes dict ptr + 4 bytes weakptr + 12 bytes gc overhead. That's not
counting malloc overhead, which should be low since objects aren't
malloced individually. Each object requires a dict, which consumes
additional 52 bytes of memory (40 bytes for the dict struct plus 12 for
gc). That's 88 bytes per object, not counting malloc overhead.

And let's not forget that if you're running on a 64-bit system, you can
double the size of every pointer.

Is there a canonical list of how much memory Python objects take up? Or a
canonical algorithm?
Here is Martin v. Löwis giving a few pointers (pardon the pun):

http://mail.python.org/pipermail/pyt...ch/135223.html
http://groups.google.de/group/comp.l...e1de5b05?hl=de
Or failing either of those, a good heuristic?
I thought there was a tool that tried to estimate this using hints from the
type, but Googling has availed me not.
>Then there's string allocation: your average string is 6 chars long; add
to that one additional char for the terminating zero.

Are you sure about that?
Yes. Look at Include/stringobject.h:

"""
Type PyStringObject represents a character string. An extra zero byte is
reserved at the end to ensure it is zero-terminated, but a size is
present so strings with null bytes in them can be represented. This
is an immutable object type.
"""
If Python strings are zero terminated, how does
Python deal with this?
>>>'a\0string'[1]
'\x00'
It stores a length separate from the value. The 0-termination is a courtesy to C
APIs that expect 0-terminated strings. It does not define the end of the Python
string, though.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Dec 19 '07 #7
Steven D'Aprano <st****@REMOVE.THIS.cybersource.com.auwrites:
On Tue, 18 Dec 2007 21:13:14 +0100, Hrvoje Niksic wrote:
>Each object takes 36 bytes itself: 4 bytes refcount + 4 bytes type ptr +
4 bytes dict ptr + 4 bytes weakptr + 12 bytes gc overhead. That's not
counting malloc overhead, which should be low since objects aren't
malloced individually. Each object requires a dict, which consumes
additional 52 bytes of memory (40 bytes for the dict struct plus 12 for
gc). That's 88 bytes per object, not counting malloc overhead.

And let's not forget that if you're running on a 64-bit system, you
can double the size of every pointer.
And of Py_ssize_t's, longs, ints with padding (placed between two
pointers). Also note the price of 8-byte struct alignment.
Is there a canonical list of how much memory Python objects take up?
Or a canonical algorithm?

Or failing either of those, a good heuristic?
For built-in types, you need to look at the code of each individual
object. For user types, you can approximate by calculations such as
the above.
>Then there's string allocation: your average string is 6 chars
long; add to that one additional char for the terminating zero.

Are you sure about that? If Python strings are zero terminated, how
does Python deal with this?
>>>'a\0string'[1]
'\x00'
Python strings are zero-terminated so the pointer to string's data can
be passed to the various C APIs (this is standard practice, C++
strings do it too.) Python doesn't rely on zero termination to
calculate string length. So len('a\0string') will do the right thing,
but the string will internally store 'a\0string\0'.
Dec 19 '07 #8
Thank you all for your useful comments and suggestions!! They're a
great starting point to redesign my script completely ;)
Cheers!
Dec 19 '07 #9
Hrvoje Niksic wrote:
Steven D'Aprano <st****@REMOVE.THIS.cybersource.com.auwrites:
>On Tue, 18 Dec 2007 21:13:14 +0100, Hrvoje Niksic wrote:
>>Each object takes 36 bytes itself: 4 bytes refcount + 4 bytes type ptr +
4 bytes dict ptr + 4 bytes weakptr + 12 bytes gc overhead. That's not
counting malloc overhead, which should be low since objects aren't
malloced individually. Each object requires a dict, which consumes
additional 52 bytes of memory (40 bytes for the dict struct plus 12 for
gc). That's 88 bytes per object, not counting malloc overhead.
And let's not forget that if you're running on a 64-bit system, you
can double the size of every pointer.

And of Py_ssize_t's, longs, ints with padding (placed between two
pointers). Also note the price of 8-byte struct alignment.
>Is there a canonical list of how much memory Python objects take up?
Or a canonical algorithm?

Or failing either of those, a good heuristic?

For built-in types, you need to look at the code of each individual
object. For user types, you can approximate by calculations such as
the above.
It would be helpful if there were a
tabulation of the memory cost for
each built-in type.

Colin W.
>
>>Then there's string allocation: your average string is 6 chars
long; add to that one additional char for the terminating zero.
Are you sure about that? If Python strings are zero terminated, how
does Python deal with this?
>>>>'a\0string'[1]
'\x00'

Python strings are zero-terminated so the pointer to string's data can
be passed to the various C APIs (this is standard practice, C++
strings do it too.) Python doesn't rely on zero termination to
calculate string length. So len('a\0string') will do the right thing,
but the string will internally store 'a\0string\0'.
Dec 20 '07 #10
Hrvoje Niksic wrote:
Steven D'Aprano <st****@REMOVE.THIS.cybersource.com.auwrites:
>On Tue, 18 Dec 2007 21:13:14 +0100, Hrvoje Niksic wrote:
>>Each object takes 36 bytes itself: 4 bytes refcount + 4 bytes type ptr +
4 bytes dict ptr + 4 bytes weakptr + 12 bytes gc overhead. That's not
counting malloc overhead, which should be low since objects aren't
malloced individually. Each object requires a dict, which consumes
additional 52 bytes of memory (40 bytes for the dict struct plus 12 for
gc). That's 88 bytes per object, not counting malloc overhead.
And let's not forget that if you're running on a 64-bit system, you
can double the size of every pointer.

And of Py_ssize_t's, longs, ints with padding (placed between two
pointers). Also note the price of 8-byte struct alignment.
>Is there a canonical list of how much memory Python objects take up?
Or a canonical algorithm?

Or failing either of those, a good heuristic?

For built-in types, you need to look at the code of each individual
object. For user types, you can approximate by calculations such as
the above.
It would be helpful if there were a
tabulation of the memory cost for
each built-in type.

Colin W.
>
>>Then there's string allocation: your average string is 6 chars
long; add to that one additional char for the terminating zero.
Are you sure about that? If Python strings are zero terminated, how
does Python deal with this?
>>>>'a\0string'[1]
'\x00'

Python strings are zero-terminated so the pointer to string's data can
be passed to the various C APIs (this is standard practice, C++
strings do it too.) Python doesn't rely on zero termination to
calculate string length. So len('a\0string') will do the right thing,
but the string will internally store 'a\0string\0'.
Dec 20 '07 #11
In article <08**********************************@l32g2000hse. googlegroups.com>,
Carl Banks <pa************@gmail.comwrote:
>On Dec 18, 4:49 pm, a...@pythoncraft.com (Aahz) wrote:
>In article <mailman.2538.1198008758.13605.python-l...@python.org>,
Chris Mellon <arka...@gmail.comwrote:
>>>
You can reduce the size of new-style classes (inherit from object) by
quite a bit if you use __slots__ to eliminate the class dictionary.

You can also reduce your functionality quite a bit by using __slots__.
Someday I'll have time to write up a proper page about why you shouldn't
use __slots__....

Shouting absolute commands without full understanding of the situation
is not going to help anyone.
Maybe not, but at least it will get people to stop for a bit.
>The OP wanted to minimize memory usage, exactly the intended usage of
slots. Without knowing more about the OP's situation, I don't think
your or I or Chris Mellon can be sure it's not right for the OP's
situation.
The whole point about warning against __slots__ is that you should never
use them unless you are certain they're the best solution. Consider what
happens when the OP wants to subclass this __slots__-using class.
Avoiding __slots__ will almost never harm anyone, so I feel completely
comfortable sticking with a blanket warning.
--
Aahz (aa**@pythoncraft.com) <* http://www.pythoncraft.com/

"Typing is cheap. Thinking is expensive." --Roy Smith
Dec 21 '07 #12
On 20 Dec 2007 19:50:31 -0800, Aahz <aa**@pythoncraft.comwrote:
In article <08**********************************@l32g2000hse. googlegroups.com>,
Carl Banks <pa************@gmail.comwrote:
On Dec 18, 4:49 pm, a...@pythoncraft.com (Aahz) wrote:
In article <mailman.2538.1198008758.13605.python-l...@python.org>,
Chris Mellon <arka...@gmail.comwrote:

You can reduce the size of new-style classes (inherit from object) by
quite a bit if you use __slots__ to eliminate the class dictionary.

You can also reduce your functionality quite a bit by using __slots__.
Someday I'll have time to write up a proper page about why you shouldn't
use __slots__....
Shouting absolute commands without full understanding of the situation
is not going to help anyone.

Maybe not, but at least it will get people to stop for a bit.
No, it will just make people stop ignoring you because you give
inappropriate advice forbidding reasonable solutions without a
rational justification.
The OP wanted to minimize memory usage, exactly the intended usage of
slots. Without knowing more about the OP's situation, I don't think
your or I or Chris Mellon can be sure it's not right for the OP's
situation.

The whole point about warning against __slots__ is that you should never
use them unless you are certain they're the best solution. Consider what
happens when the OP wants to subclass this __slots__-using class.
Nothing. Subclasses of a class with __slots__ get a dict just like
anything else, unless they also define slots. Why do you think you can
subclass object to get something you can stick arbitrary attributes
on?
Avoiding __slots__ will almost never harm anyone, so I feel completely
comfortable sticking with a blanket warning.
--
Barking out your blanket warning in a thread on *the exact use case
slots were implemented to address* just makes you look like a mindless
reactionary. Stick to posting your warning in threads where people ask
how to stop "other people" from setting attributes on "their"
instances.
Dec 21 '07 #13
Chris Mellon wrote:
On 20 Dec 2007 19:50:31 -0800, Aahz <aa**@pythoncraft.comwrote:
>In article <08**********************************@l32g2000hse. googlegroups.com>,
Carl Banks <pa************@gmail.comwrote:
>>>Someday I'll have time to write up a proper page about why you shouldn't
use __slots__....
Barking out your blanket warning in a thread on *the exact use case
slots were implemented to address* just makes you look like a mindless
reactionary. Stick to posting your warning in threads where people ask
how to stop "other people" from setting attributes on "their"
instances.
Agreed.

I'd like to hear more about what kind of performance gain can be
obtained from "__slots__". I'm looking into ways of speeding up
HTML parsing via BeautifulSoup. If a significant speedup can be
obtained when navigating large trees of small objects, that's worth
quite a bit to me.

John Nagle
SiteTruth
Dec 21 '07 #14
John Nagle wrote:
I'd like to hear more about what kind of performance gain can be
obtained from "__slots__". I'm looking into ways of speeding up
HTML parsing via BeautifulSoup. If a significant speedup can be
obtained when navigating large trees of small objects, that's worth
quite a bit to me.
The following micro-benchmarks are from Python 2.5 on a Core Duo
machine. C0 is an old-style class, C1 is a new-style class, C2 is a
new-style class using __slots__:

# read access
$ timeit -s "import q; o = q.C0(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.133 usec per loop
$ timeit -s "import q; o = q.C1(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.184 usec per loop
$ timeit -s "import q; o = q.C2(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.161 usec per loop

# write access
$ timeit -s "import q; o = q.C0(); o.attrib = 1" "o.attrib = 1"
10000000 loops, best of 3: 0.15 usec per loop
$ timeit -s "import q; o = q.C1(); o.attrib = 1" "o.attrib = 1"
1000000 loops, best of 3: 0.217 usec per loop
$ timeit -s "import q; o = q.C2(); o.attrib = 1" "o.attrib = 1"
1000000 loops, best of 3: 0.209 usec per loop

$ more q.py
class C0:
pass

class C1(object):
pass

class C2(object):
__slots__ = ["attrib"]

Your mileage may vary.
I'm looking into ways of speeding up HTML parsing via BeautifulSoup.
The solution to that is spelled "lxml".

</F>

Dec 21 '07 #15
My milage does vary, see this older post

<http://mail.python.org/pipermail/python-list/2004-May/261985.html>

Similar figures are shown with Python 2.5, both for 32- and 64-bit.

/Jean Brouwers
On Dec 21, 12:07*pm, Fredrik Lundh <fred...@pythonware.comwrote:
John Nagle wrote:
* * *I'd like to hear more about what kind of performance gain canbe
obtained from "__slots__". *I'm looking into ways of speeding up
HTML parsing via BeautifulSoup. *If a significant speedup can be
obtained when navigating large trees of small objects, that's worth
quite a bit to me.

The following micro-benchmarks are from Python 2.5 on a Core Duo
machine. *C0 is an old-style class, C1 is a new-style class, C2 is a
new-style class using __slots__:

# read access
$ timeit -s "import q; o = q.C0(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.133 usec per loop
$ timeit -s "import q; o = q.C1(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.184 usec per loop
$ timeit -s "import q; o = q.C2(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.161 usec per loop

# write access
$ timeit -s "import q; o = q.C0(); o.attrib = 1" "o.attrib = 1"
10000000 loops, best of 3: 0.15 usec per loop
$ timeit -s "import q; o = q.C1(); o.attrib = 1" "o.attrib = 1"
1000000 loops, best of 3: 0.217 usec per loop
$ timeit -s "import q; o = q.C2(); o.attrib = 1" "o.attrib = 1"
1000000 loops, best of 3: 0.209 usec per loop

$ more q.py
class C0:
* * *pass

class C1(object):
* * *pass

class C2(object):
* * *__slots__ = ["attrib"]

Your mileage may vary.

*I'm looking into ways of speeding up HTML parsing via BeautifulSoup.

The solution to that is spelled "lxml".

</F>
Dec 21 '07 #16
MrJean1 wrote:
My milage does vary, see this older post

<http://mail.python.org/pipermail/python-list/2004-May/261985.html>

Similar figures are shown with Python 2.5, both for 32- and 64-bit.
unless I'm missing something, you're measuring object creation time.

I'm measuring attribute access time (the topic was "navigating large
trees of small objects", not building them).

</F>

Dec 21 '07 #17
On Dec 21, 2007 2:07 PM, Fredrik Lundh <fr*****@pythonware.comwrote:
John Nagle wrote:
I'd like to hear more about what kind of performance gain can be
obtained from "__slots__". I'm looking into ways of speeding up
HTML parsing via BeautifulSoup. If a significant speedup can be
obtained when navigating large trees of small objects, that's worth
quite a bit to me.

The following micro-benchmarks are from Python 2.5 on a Core Duo
machine. C0 is an old-style class, C1 is a new-style class, C2 is a
new-style class using __slots__:

# read access
$ timeit -s "import q; o = q.C0(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.133 usec per loop
$ timeit -s "import q; o = q.C1(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.184 usec per loop
$ timeit -s "import q; o = q.C2(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.161 usec per loop

# write access
$ timeit -s "import q; o = q.C0(); o.attrib = 1" "o.attrib = 1"
10000000 loops, best of 3: 0.15 usec per loop
$ timeit -s "import q; o = q.C1(); o.attrib = 1" "o.attrib = 1"
1000000 loops, best of 3: 0.217 usec per loop
$ timeit -s "import q; o = q.C2(); o.attrib = 1" "o.attrib = 1"
1000000 loops, best of 3: 0.209 usec per loop

$ more q.py
class C0:
pass

class C1(object):
pass

class C2(object):
__slots__ = ["attrib"]

Your mileage may vary.

Here are my timings of object creation time. Note that as you add
slots, the speed advantage disappears.

C:\>python -m timeit -s "from objs import C0 as c" "c()"
1000000 loops, best of 3: 0.298 usec per loop

C:\>python -m timeit -s "from objs import C1 as c" "c()"
10000000 loops, best of 3: 0.172 usec per loop

C:\>python -m timeit -s "from objs import C2 as c" "c()"
10000000 loops, best of 3: 0.164 usec per loop

C:\>python -m timeit -s "from objs import C3 as c" "c()"
1000000 loops, best of 3: 0.302 usec per loop

#objs.py
import string

class C0:
pass

class C1(object):
pass

class C2(object):
__slots__ = ["foo"]

class C3(object):
__slots__ = list(string.ascii_letters)
Dec 21 '07 #18
You are correct. Mea culpa.

/Jean Brouwers

On Dec 21, 1:41*pm, Fredrik Lundh <fred...@pythonware.comwrote:
MrJean1 wrote:
My milage does vary, see this older post
* <http://mail.python.org/pipermail/python-list/2004-May/261985.html>
Similar figures are shown with Python 2.5, both for 32- and 64-bit.

unless I'm missing something, you're measuring object creation time.

I'm measuring attribute access time (the topic was "navigating large
trees of small objects", not building them).

</F>
Dec 21 '07 #19
Fredrik Lundh wrote:
John Nagle wrote:
> I'd like to hear more about what kind of performance gain can be
obtained from "__slots__". I'm looking into ways of speeding up
HTML parsing via BeautifulSoup. If a significant speedup can be
obtained when navigating large trees of small objects, that's worth
quite a bit to me.

The following micro-benchmarks are from Python 2.5 on a Core Duo
machine. C0 is an old-style class, C1 is a new-style class, C2 is a
new-style class using __slots__:

# read access
$ timeit -s "import q; o = q.C0(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.133 usec per loop
$ timeit -s "import q; o = q.C1(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.184 usec per loop
$ timeit -s "import q; o = q.C2(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.161 usec per loop

# write access
$ timeit -s "import q; o = q.C0(); o.attrib = 1" "o.attrib = 1"
10000000 loops, best of 3: 0.15 usec per loop
$ timeit -s "import q; o = q.C1(); o.attrib = 1" "o.attrib = 1"
1000000 loops, best of 3: 0.217 usec per loop
$ timeit -s "import q; o = q.C2(); o.attrib = 1" "o.attrib = 1"
1000000 loops, best of 3: 0.209 usec per loop
Not much of a win there. Thanks.
>
I'm looking into ways of speeding up HTML parsing via BeautifulSoup.

The solution to that is spelled "lxml".
I may eventually have to go to a non-Python solution.
But I've finally made enough robustness fixes to BeautifulSoup that
it's usable on large numbers of real-world web sites. (Only two
exceptions in the last 100,000 web sites processed. If you want
to exercise your HTML parser on hard cases, run hostile-code web
sites through it.)

John Nagle
Dec 23 '07 #20

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Carl Bevil | last post by:
Hello all. If I want to use a custom memory manager to, say, track memory allocations in Python, what's the best way to do this? I seem to remember there being a way in version 1.5 (or so -- been...
7
by: Dev | last post by:
Hello, In the following class definition, the ZString destructor is invoked two times. This crashes the code. class ZString { public: ZString(char* p)
6
by: Mel | last post by:
I have a large collection of custom objects, each representing a period in time with each having a start datetime and an end datetime. I frequently need to query this collection to return a subset...
6
by: Scott Zabolotzky | last post by:
I'm trying to pass a custom object back and forth between forms. This custom object is pulled into the app using an external reference to an assembly DLL that was given to me by a co-worker. A...
3
by: JimGreen | last post by:
We are designing a WinForm application ( three tiered) There is a debate in our group as to whether we should pass datasets or our custom collections from business layer to the user interface...
19
by: Jamey Shuemaker | last post by:
I'm in the process of expanding my knowledge and use of Class Modules. I've perused MSDN and this and other sites, and I'm pretty comfortable with my understanding of Class Modules with the...
4
by: sreedhar.cs | last post by:
Hi all, In my application,I want to place a vector in a specific location in shared memory.(a user supplied pointer). I understand that the STL allocator mechanism places the data objects within...
1
by: =?Utf-8?B?QW50aG9ueSBRdWVlbg==?= | last post by:
Hello All, I have created a custom "field" object in .Net that implements IOleObject. It inherits from "System.Windows.Forms.Label". It also has several public properties that contain data to be...
0
by: ntalkos | last post by:
Hi all. I am developing a custom business object model, consisting of 2 objects, Question and Answer. One Question can have many possible Answers. So, the Answer class has a public property called...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 4 Oct 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
0
by: Aliciasmith | last post by:
In an age dominated by smartphones, having a mobile app for your business is no longer an option; it's a necessity. Whether you're a startup or an established enterprise, finding the right mobile app...
0
tracyyun
by: tracyyun | last post by:
Hello everyone, I have a question and would like some advice on network connectivity. I have one computer connected to my router via WiFi, but I have two other computers that I want to be able to...
2
by: giovanniandrean | last post by:
The energy model is structured as follows and uses excel sheets to give input data: 1-Utility.py contains all the functions needed to calculate the variables and other minor things (mentions...
3
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be using a very simple database which has Form (clsForm) & Report (clsReport) classes that simply handle making the calling Form invisible until the Form, or all...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 1 Nov 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM) Please note that the UK and Europe revert to winter time on...
3
by: nia12 | last post by:
Hi there, I am very new to Access so apologies if any of this is obvious/not clear. I am creating a data collection tool for health care employees to complete. It consists of a number of...
0
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be focusing on the Report (clsReport) class. This simply handles making the calling Form invisible until all of the Reports opened by it have been closed, when it...
3
SueHopson
by: SueHopson | last post by:
Hi All, I'm trying to create a single code (run off a button that calls the Private Sub) for our parts list report that will allow the user to filter by either/both PartVendor and PartType. On...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.