468,531 Members | 1,640 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,531 developers. It's quick & easy.

Why custom objects take so much memory?

Hi,

after a couple of days of script debugging, I kind of found that some
assumptions I was doing about the memory complexity of my classes are
not true. I decided to do a simple script to isolate the problem:

class MyClass:
def __init__(self,s):
self.mystring = s

mylist = []
for i in range(1024*1024):
mylist.append(MyClass(str(i))) #allocation
#stage 1
mylist = None
gc.collect()
#stage 2

I take measures of the memory consumption of the script at #stage1 and
#stage 2 and I obtain:
#stage1 -238MB
#stage2 -15MB

That means every object is around 223 bytes in size!!!! That's too
much considering it only contains a string with a maximum size of 7
chars.

If you change the allocation line for this other:
>>mylist.append(str(i)) #we don't create the custom class, but append the string directly into the list
the numbers decrease substantially to:
#stage1 -47.6MB
#stage2 -15MB
(so this time we can say string vars occupy around 32 bytes....still a
lot, isn't it?)

So, what's exactly going on behind the scenes? Why is using custom
objects SO expensive? What other ways of creating structures can be
used (cheaper in memory usage)?

Thanks a lot in advance!

Dec 18 '07 #1
19 4178
On Dec 18, 2007 1:26 PM, jsanshef <js*******@gmail.comwrote:
Hi,

after a couple of days of script debugging, I kind of found that some
assumptions I was doing about the memory complexity of my classes are
not true. I decided to do a simple script to isolate the problem:

class MyClass:
def __init__(self,s):
self.mystring = s

mylist = []
for i in range(1024*1024):
mylist.append(MyClass(str(i))) #allocation
#stage 1
mylist = None
gc.collect()
#stage 2

I take measures of the memory consumption of the script at #stage1 and
#stage 2 and I obtain:
#stage1 -238MB
#stage2 -15MB

That means every object is around 223 bytes in size!!!! That's too
much considering it only contains a string with a maximum size of 7
chars.
Classes are fairly heavyweight - in your case you've got the size of
the PyObject struct, a dictionary for class attributes (which itself
is another pyobject), and the string object (yet another pyobject),
and the actual string data.
If you change the allocation line for this other:
>mylist.append(str(i)) #we don't create the custom class, but append the string directly into the list

the numbers decrease substantially to:
#stage1 -47.6MB
#stage2 -15MB
(so this time we can say string vars occupy around 32 bytes....still a
lot, isn't it?)
string objects don't have dictionaries and are smaller than "regular"
python objects.
So, what's exactly going on behind the scenes? Why is using custom
objects SO expensive? What other ways of creating structures can be
used (cheaper in memory usage)?
If you're worried about per-instance memory costs Python is probably
not the language for your purposes. On the other hand, odds are that
you actually don't need to worry so much.

You can reduce the size of new-style classes (inherit from object) by
quite a bit if you use __slots__ to eliminate the class dictionary.
Dec 18 '07 #2
jsanshef <js*******@gmail.comwrites:
That means every object is around 223 bytes in size!!!! That's too
much considering it only contains a string with a maximum size of 7
chars.
The list itself consumes 4 MB because it stores 1 million PyObject
pointers. It possibly consumes more due to overallocation, but let's
ignore that.

Each object takes 36 bytes itself: 4 bytes refcount + 4 bytes type ptr
+ 4 bytes dict ptr + 4 bytes weakptr + 12 bytes gc overhead. That's
not counting malloc overhead, which should be low since objects aren't
malloced individually. Each object requires a dict, which consumes
additional 52 bytes of memory (40 bytes for the dict struct plus 12
for gc). That's 88 bytes per object, not counting malloc overhead.

Then there's string allocation: your average string is 6 chars long;
add to that one additional char for the terminating zero. The string
struct takes up 20 bytes + string length, rounded to nearest
alignment. For your average case, that's 27 bytes, rounded (I assume) to 28.
You also allocate 1024*1024 integers which are never freed (they're
kept on a free list), and each of which takes up at least 12 bytes.

All that adds up to 128 bytes per object, dispersed over several
different object types. It doesn't surprise me that Python is eating
200+ MB of memory.
So, what's exactly going on behind the scenes? Why is using custom
objects SO expensive? What other ways of creating structures can be
used (cheaper in memory usage)?

Thanks a lot in advance!
Use a new-style class and set __slots__:

class MyClass(object):
__slots__ = 'mystring',
def __init__(self, s):
self.mystring = s

That brings down memory consumption to ~80MB, by cutting down the size
of object instance and removing the dict.
Dec 18 '07 #3
In article <ma***************************************@python. org>,
Chris Mellon <ar*****@gmail.comwrote:
>
You can reduce the size of new-style classes (inherit from object) by
quite a bit if you use __slots__ to eliminate the class dictionary.
You can also reduce your functionality quite a bit by using __slots__.
Someday I'll have time to write up a proper page about why you shouldn't
use __slots__....
--
Aahz (aa**@pythoncraft.com) <* http://www.pythoncraft.com/

"Typing is cheap. Thinking is expensive." --Roy Smith
Dec 18 '07 #4
On Dec 18, 4:49 pm, a...@pythoncraft.com (Aahz) wrote:
In article <mailman.2538.1198008758.13605.python-l...@python.org>,

Chris Mellon <arka...@gmail.comwrote:
You can reduce the size of new-style classes (inherit from object) by
quite a bit if you use __slots__ to eliminate the class dictionary.

You can also reduce your functionality quite a bit by using __slots__.
Someday I'll have time to write up a proper page about why you shouldn't
use __slots__....
Shouting absolute commands without full understanding of the situation
is not going to help anyone.

The OP wanted to minimize memory usage, exactly the intended usage of
slots. Without knowing more about the OP's situation, I don't think
your or I or Chris Mellon can be sure it's not right for the OP's
situation.

You're obviously smart and highly expert in Python--I just wish you
would be more constructive with your advice.
Carl Banks
Dec 18 '07 #5
On Tue, 18 Dec 2007 21:13:14 +0100, Hrvoje Niksic wrote:
Each object takes 36 bytes itself: 4 bytes refcount + 4 bytes type ptr +
4 bytes dict ptr + 4 bytes weakptr + 12 bytes gc overhead. That's not
counting malloc overhead, which should be low since objects aren't
malloced individually. Each object requires a dict, which consumes
additional 52 bytes of memory (40 bytes for the dict struct plus 12 for
gc). That's 88 bytes per object, not counting malloc overhead.
And let's not forget that if you're running on a 64-bit system, you can
double the size of every pointer.

Is there a canonical list of how much memory Python objects take up? Or a
canonical algorithm?

Or failing either of those, a good heuristic?

Then there's string allocation: your average string is 6 chars long; add
to that one additional char for the terminating zero.
Are you sure about that? If Python strings are zero terminated, how does
Python deal with this?
>>'a\0string'[1]
'\x00'


--
Steven
Dec 19 '07 #6
Steven D'Aprano wrote:
On Tue, 18 Dec 2007 21:13:14 +0100, Hrvoje Niksic wrote:
>Each object takes 36 bytes itself: 4 bytes refcount + 4 bytes type ptr +
4 bytes dict ptr + 4 bytes weakptr + 12 bytes gc overhead. That's not
counting malloc overhead, which should be low since objects aren't
malloced individually. Each object requires a dict, which consumes
additional 52 bytes of memory (40 bytes for the dict struct plus 12 for
gc). That's 88 bytes per object, not counting malloc overhead.

And let's not forget that if you're running on a 64-bit system, you can
double the size of every pointer.

Is there a canonical list of how much memory Python objects take up? Or a
canonical algorithm?
Here is Martin v. Löwis giving a few pointers (pardon the pun):

http://mail.python.org/pipermail/pyt...ch/135223.html
http://groups.google.de/group/comp.l...e1de5b05?hl=de
Or failing either of those, a good heuristic?
I thought there was a tool that tried to estimate this using hints from the
type, but Googling has availed me not.
>Then there's string allocation: your average string is 6 chars long; add
to that one additional char for the terminating zero.

Are you sure about that?
Yes. Look at Include/stringobject.h:

"""
Type PyStringObject represents a character string. An extra zero byte is
reserved at the end to ensure it is zero-terminated, but a size is
present so strings with null bytes in them can be represented. This
is an immutable object type.
"""
If Python strings are zero terminated, how does
Python deal with this?
>>>'a\0string'[1]
'\x00'
It stores a length separate from the value. The 0-termination is a courtesy to C
APIs that expect 0-terminated strings. It does not define the end of the Python
string, though.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Dec 19 '07 #7
Steven D'Aprano <st****@REMOVE.THIS.cybersource.com.auwrites:
On Tue, 18 Dec 2007 21:13:14 +0100, Hrvoje Niksic wrote:
>Each object takes 36 bytes itself: 4 bytes refcount + 4 bytes type ptr +
4 bytes dict ptr + 4 bytes weakptr + 12 bytes gc overhead. That's not
counting malloc overhead, which should be low since objects aren't
malloced individually. Each object requires a dict, which consumes
additional 52 bytes of memory (40 bytes for the dict struct plus 12 for
gc). That's 88 bytes per object, not counting malloc overhead.

And let's not forget that if you're running on a 64-bit system, you
can double the size of every pointer.
And of Py_ssize_t's, longs, ints with padding (placed between two
pointers). Also note the price of 8-byte struct alignment.
Is there a canonical list of how much memory Python objects take up?
Or a canonical algorithm?

Or failing either of those, a good heuristic?
For built-in types, you need to look at the code of each individual
object. For user types, you can approximate by calculations such as
the above.
>Then there's string allocation: your average string is 6 chars
long; add to that one additional char for the terminating zero.

Are you sure about that? If Python strings are zero terminated, how
does Python deal with this?
>>>'a\0string'[1]
'\x00'
Python strings are zero-terminated so the pointer to string's data can
be passed to the various C APIs (this is standard practice, C++
strings do it too.) Python doesn't rely on zero termination to
calculate string length. So len('a\0string') will do the right thing,
but the string will internally store 'a\0string\0'.
Dec 19 '07 #8
Thank you all for your useful comments and suggestions!! They're a
great starting point to redesign my script completely ;)
Cheers!
Dec 19 '07 #9
Hrvoje Niksic wrote:
Steven D'Aprano <st****@REMOVE.THIS.cybersource.com.auwrites:
>On Tue, 18 Dec 2007 21:13:14 +0100, Hrvoje Niksic wrote:
>>Each object takes 36 bytes itself: 4 bytes refcount + 4 bytes type ptr +
4 bytes dict ptr + 4 bytes weakptr + 12 bytes gc overhead. That's not
counting malloc overhead, which should be low since objects aren't
malloced individually. Each object requires a dict, which consumes
additional 52 bytes of memory (40 bytes for the dict struct plus 12 for
gc). That's 88 bytes per object, not counting malloc overhead.
And let's not forget that if you're running on a 64-bit system, you
can double the size of every pointer.

And of Py_ssize_t's, longs, ints with padding (placed between two
pointers). Also note the price of 8-byte struct alignment.
>Is there a canonical list of how much memory Python objects take up?
Or a canonical algorithm?

Or failing either of those, a good heuristic?

For built-in types, you need to look at the code of each individual
object. For user types, you can approximate by calculations such as
the above.
It would be helpful if there were a
tabulation of the memory cost for
each built-in type.

Colin W.
>
>>Then there's string allocation: your average string is 6 chars
long; add to that one additional char for the terminating zero.
Are you sure about that? If Python strings are zero terminated, how
does Python deal with this?
>>>>'a\0string'[1]
'\x00'

Python strings are zero-terminated so the pointer to string's data can
be passed to the various C APIs (this is standard practice, C++
strings do it too.) Python doesn't rely on zero termination to
calculate string length. So len('a\0string') will do the right thing,
but the string will internally store 'a\0string\0'.
Dec 20 '07 #10
Hrvoje Niksic wrote:
Steven D'Aprano <st****@REMOVE.THIS.cybersource.com.auwrites:
>On Tue, 18 Dec 2007 21:13:14 +0100, Hrvoje Niksic wrote:
>>Each object takes 36 bytes itself: 4 bytes refcount + 4 bytes type ptr +
4 bytes dict ptr + 4 bytes weakptr + 12 bytes gc overhead. That's not
counting malloc overhead, which should be low since objects aren't
malloced individually. Each object requires a dict, which consumes
additional 52 bytes of memory (40 bytes for the dict struct plus 12 for
gc). That's 88 bytes per object, not counting malloc overhead.
And let's not forget that if you're running on a 64-bit system, you
can double the size of every pointer.

And of Py_ssize_t's, longs, ints with padding (placed between two
pointers). Also note the price of 8-byte struct alignment.
>Is there a canonical list of how much memory Python objects take up?
Or a canonical algorithm?

Or failing either of those, a good heuristic?

For built-in types, you need to look at the code of each individual
object. For user types, you can approximate by calculations such as
the above.
It would be helpful if there were a
tabulation of the memory cost for
each built-in type.

Colin W.
>
>>Then there's string allocation: your average string is 6 chars
long; add to that one additional char for the terminating zero.
Are you sure about that? If Python strings are zero terminated, how
does Python deal with this?
>>>>'a\0string'[1]
'\x00'

Python strings are zero-terminated so the pointer to string's data can
be passed to the various C APIs (this is standard practice, C++
strings do it too.) Python doesn't rely on zero termination to
calculate string length. So len('a\0string') will do the right thing,
but the string will internally store 'a\0string\0'.
Dec 20 '07 #11
In article <08**********************************@l32g2000hse. googlegroups.com>,
Carl Banks <pa************@gmail.comwrote:
>On Dec 18, 4:49 pm, a...@pythoncraft.com (Aahz) wrote:
>In article <mailman.2538.1198008758.13605.python-l...@python.org>,
Chris Mellon <arka...@gmail.comwrote:
>>>
You can reduce the size of new-style classes (inherit from object) by
quite a bit if you use __slots__ to eliminate the class dictionary.

You can also reduce your functionality quite a bit by using __slots__.
Someday I'll have time to write up a proper page about why you shouldn't
use __slots__....

Shouting absolute commands without full understanding of the situation
is not going to help anyone.
Maybe not, but at least it will get people to stop for a bit.
>The OP wanted to minimize memory usage, exactly the intended usage of
slots. Without knowing more about the OP's situation, I don't think
your or I or Chris Mellon can be sure it's not right for the OP's
situation.
The whole point about warning against __slots__ is that you should never
use them unless you are certain they're the best solution. Consider what
happens when the OP wants to subclass this __slots__-using class.
Avoiding __slots__ will almost never harm anyone, so I feel completely
comfortable sticking with a blanket warning.
--
Aahz (aa**@pythoncraft.com) <* http://www.pythoncraft.com/

"Typing is cheap. Thinking is expensive." --Roy Smith
Dec 21 '07 #12
On 20 Dec 2007 19:50:31 -0800, Aahz <aa**@pythoncraft.comwrote:
In article <08**********************************@l32g2000hse. googlegroups.com>,
Carl Banks <pa************@gmail.comwrote:
On Dec 18, 4:49 pm, a...@pythoncraft.com (Aahz) wrote:
In article <mailman.2538.1198008758.13605.python-l...@python.org>,
Chris Mellon <arka...@gmail.comwrote:

You can reduce the size of new-style classes (inherit from object) by
quite a bit if you use __slots__ to eliminate the class dictionary.

You can also reduce your functionality quite a bit by using __slots__.
Someday I'll have time to write up a proper page about why you shouldn't
use __slots__....
Shouting absolute commands without full understanding of the situation
is not going to help anyone.

Maybe not, but at least it will get people to stop for a bit.
No, it will just make people stop ignoring you because you give
inappropriate advice forbidding reasonable solutions without a
rational justification.
The OP wanted to minimize memory usage, exactly the intended usage of
slots. Without knowing more about the OP's situation, I don't think
your or I or Chris Mellon can be sure it's not right for the OP's
situation.

The whole point about warning against __slots__ is that you should never
use them unless you are certain they're the best solution. Consider what
happens when the OP wants to subclass this __slots__-using class.
Nothing. Subclasses of a class with __slots__ get a dict just like
anything else, unless they also define slots. Why do you think you can
subclass object to get something you can stick arbitrary attributes
on?
Avoiding __slots__ will almost never harm anyone, so I feel completely
comfortable sticking with a blanket warning.
--
Barking out your blanket warning in a thread on *the exact use case
slots were implemented to address* just makes you look like a mindless
reactionary. Stick to posting your warning in threads where people ask
how to stop "other people" from setting attributes on "their"
instances.
Dec 21 '07 #13
Chris Mellon wrote:
On 20 Dec 2007 19:50:31 -0800, Aahz <aa**@pythoncraft.comwrote:
>In article <08**********************************@l32g2000hse. googlegroups.com>,
Carl Banks <pa************@gmail.comwrote:
>>>Someday I'll have time to write up a proper page about why you shouldn't
use __slots__....
Barking out your blanket warning in a thread on *the exact use case
slots were implemented to address* just makes you look like a mindless
reactionary. Stick to posting your warning in threads where people ask
how to stop "other people" from setting attributes on "their"
instances.
Agreed.

I'd like to hear more about what kind of performance gain can be
obtained from "__slots__". I'm looking into ways of speeding up
HTML parsing via BeautifulSoup. If a significant speedup can be
obtained when navigating large trees of small objects, that's worth
quite a bit to me.

John Nagle
SiteTruth
Dec 21 '07 #14
John Nagle wrote:
I'd like to hear more about what kind of performance gain can be
obtained from "__slots__". I'm looking into ways of speeding up
HTML parsing via BeautifulSoup. If a significant speedup can be
obtained when navigating large trees of small objects, that's worth
quite a bit to me.
The following micro-benchmarks are from Python 2.5 on a Core Duo
machine. C0 is an old-style class, C1 is a new-style class, C2 is a
new-style class using __slots__:

# read access
$ timeit -s "import q; o = q.C0(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.133 usec per loop
$ timeit -s "import q; o = q.C1(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.184 usec per loop
$ timeit -s "import q; o = q.C2(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.161 usec per loop

# write access
$ timeit -s "import q; o = q.C0(); o.attrib = 1" "o.attrib = 1"
10000000 loops, best of 3: 0.15 usec per loop
$ timeit -s "import q; o = q.C1(); o.attrib = 1" "o.attrib = 1"
1000000 loops, best of 3: 0.217 usec per loop
$ timeit -s "import q; o = q.C2(); o.attrib = 1" "o.attrib = 1"
1000000 loops, best of 3: 0.209 usec per loop

$ more q.py
class C0:
pass

class C1(object):
pass

class C2(object):
__slots__ = ["attrib"]

Your mileage may vary.
I'm looking into ways of speeding up HTML parsing via BeautifulSoup.
The solution to that is spelled "lxml".

</F>

Dec 21 '07 #15
My milage does vary, see this older post

<http://mail.python.org/pipermail/python-list/2004-May/261985.html>

Similar figures are shown with Python 2.5, both for 32- and 64-bit.

/Jean Brouwers
On Dec 21, 12:07*pm, Fredrik Lundh <fred...@pythonware.comwrote:
John Nagle wrote:
* * *I'd like to hear more about what kind of performance gain canbe
obtained from "__slots__". *I'm looking into ways of speeding up
HTML parsing via BeautifulSoup. *If a significant speedup can be
obtained when navigating large trees of small objects, that's worth
quite a bit to me.

The following micro-benchmarks are from Python 2.5 on a Core Duo
machine. *C0 is an old-style class, C1 is a new-style class, C2 is a
new-style class using __slots__:

# read access
$ timeit -s "import q; o = q.C0(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.133 usec per loop
$ timeit -s "import q; o = q.C1(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.184 usec per loop
$ timeit -s "import q; o = q.C2(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.161 usec per loop

# write access
$ timeit -s "import q; o = q.C0(); o.attrib = 1" "o.attrib = 1"
10000000 loops, best of 3: 0.15 usec per loop
$ timeit -s "import q; o = q.C1(); o.attrib = 1" "o.attrib = 1"
1000000 loops, best of 3: 0.217 usec per loop
$ timeit -s "import q; o = q.C2(); o.attrib = 1" "o.attrib = 1"
1000000 loops, best of 3: 0.209 usec per loop

$ more q.py
class C0:
* * *pass

class C1(object):
* * *pass

class C2(object):
* * *__slots__ = ["attrib"]

Your mileage may vary.

*I'm looking into ways of speeding up HTML parsing via BeautifulSoup.

The solution to that is spelled "lxml".

</F>
Dec 21 '07 #16
MrJean1 wrote:
My milage does vary, see this older post

<http://mail.python.org/pipermail/python-list/2004-May/261985.html>

Similar figures are shown with Python 2.5, both for 32- and 64-bit.
unless I'm missing something, you're measuring object creation time.

I'm measuring attribute access time (the topic was "navigating large
trees of small objects", not building them).

</F>

Dec 21 '07 #17
On Dec 21, 2007 2:07 PM, Fredrik Lundh <fr*****@pythonware.comwrote:
John Nagle wrote:
I'd like to hear more about what kind of performance gain can be
obtained from "__slots__". I'm looking into ways of speeding up
HTML parsing via BeautifulSoup. If a significant speedup can be
obtained when navigating large trees of small objects, that's worth
quite a bit to me.

The following micro-benchmarks are from Python 2.5 on a Core Duo
machine. C0 is an old-style class, C1 is a new-style class, C2 is a
new-style class using __slots__:

# read access
$ timeit -s "import q; o = q.C0(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.133 usec per loop
$ timeit -s "import q; o = q.C1(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.184 usec per loop
$ timeit -s "import q; o = q.C2(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.161 usec per loop

# write access
$ timeit -s "import q; o = q.C0(); o.attrib = 1" "o.attrib = 1"
10000000 loops, best of 3: 0.15 usec per loop
$ timeit -s "import q; o = q.C1(); o.attrib = 1" "o.attrib = 1"
1000000 loops, best of 3: 0.217 usec per loop
$ timeit -s "import q; o = q.C2(); o.attrib = 1" "o.attrib = 1"
1000000 loops, best of 3: 0.209 usec per loop

$ more q.py
class C0:
pass

class C1(object):
pass

class C2(object):
__slots__ = ["attrib"]

Your mileage may vary.

Here are my timings of object creation time. Note that as you add
slots, the speed advantage disappears.

C:\>python -m timeit -s "from objs import C0 as c" "c()"
1000000 loops, best of 3: 0.298 usec per loop

C:\>python -m timeit -s "from objs import C1 as c" "c()"
10000000 loops, best of 3: 0.172 usec per loop

C:\>python -m timeit -s "from objs import C2 as c" "c()"
10000000 loops, best of 3: 0.164 usec per loop

C:\>python -m timeit -s "from objs import C3 as c" "c()"
1000000 loops, best of 3: 0.302 usec per loop

#objs.py
import string

class C0:
pass

class C1(object):
pass

class C2(object):
__slots__ = ["foo"]

class C3(object):
__slots__ = list(string.ascii_letters)
Dec 21 '07 #18
You are correct. Mea culpa.

/Jean Brouwers

On Dec 21, 1:41*pm, Fredrik Lundh <fred...@pythonware.comwrote:
MrJean1 wrote:
My milage does vary, see this older post
* <http://mail.python.org/pipermail/python-list/2004-May/261985.html>
Similar figures are shown with Python 2.5, both for 32- and 64-bit.

unless I'm missing something, you're measuring object creation time.

I'm measuring attribute access time (the topic was "navigating large
trees of small objects", not building them).

</F>
Dec 21 '07 #19
Fredrik Lundh wrote:
John Nagle wrote:
> I'd like to hear more about what kind of performance gain can be
obtained from "__slots__". I'm looking into ways of speeding up
HTML parsing via BeautifulSoup. If a significant speedup can be
obtained when navigating large trees of small objects, that's worth
quite a bit to me.

The following micro-benchmarks are from Python 2.5 on a Core Duo
machine. C0 is an old-style class, C1 is a new-style class, C2 is a
new-style class using __slots__:

# read access
$ timeit -s "import q; o = q.C0(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.133 usec per loop
$ timeit -s "import q; o = q.C1(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.184 usec per loop
$ timeit -s "import q; o = q.C2(); o.attrib = 1" "o.attrib"
10000000 loops, best of 3: 0.161 usec per loop

# write access
$ timeit -s "import q; o = q.C0(); o.attrib = 1" "o.attrib = 1"
10000000 loops, best of 3: 0.15 usec per loop
$ timeit -s "import q; o = q.C1(); o.attrib = 1" "o.attrib = 1"
1000000 loops, best of 3: 0.217 usec per loop
$ timeit -s "import q; o = q.C2(); o.attrib = 1" "o.attrib = 1"
1000000 loops, best of 3: 0.209 usec per loop
Not much of a win there. Thanks.
>
I'm looking into ways of speeding up HTML parsing via BeautifulSoup.

The solution to that is spelled "lxml".
I may eventually have to go to a non-Python solution.
But I've finally made enough robustness fixes to BeautifulSoup that
it's usable on large numbers of real-world web sites. (Only two
exceptions in the last 100,000 web sites processed. If you want
to exercise your HTML parser on hard cases, run hostile-code web
sites through it.)

John Nagle
Dec 23 '07 #20

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by Carl Bevil | last post: by
6 posts views Thread by Scott Zabolotzky | last post: by
3 posts views Thread by JimGreen | last post: by
19 posts views Thread by Jamey Shuemaker | last post: by
4 posts views Thread by sreedhar.cs | last post: by
1 post views Thread by =?Utf-8?B?QW50aG9ueSBRdWVlbg==?= | last post: by
reply views Thread by NPC403 | last post: by
1 post views Thread by fmendoza | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.