By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,768 Members | 2,054 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,768 IT Pros & Developers. It's quick & easy.

Any problems with *lots* of attributes?

P: n/a
Hi all

This is a question about the performance trade-off between two methods
of storing data. One method is to create a separate attribute for each
piece of data, and get/set the value according to its name. Another
method is to create a list, and get/set the value according to its
position in the list. In my limited tests, the first method is
quicker.

My application is database-intensive. I have a Table class which is
instantiated for each table that is opened. I have a Column class
which is instantiated for each column in each table. The number of
attributes for each Column can be up to about 20. It is possible for
the number of column instances at any one time to run into the
hundreds, therefore the total number of attributes can run into
thousands.

At the moment I store the data for each column in a single Column
attribute, as a list. If I switch to creating a separate Column
attribute for each piece of data, is there a danger that the number of
attributes may eventually reach a point where any performance gain is
negated or even reversed?

Any advice will be much appreciated.

Frank Millman
Jul 18 '05 #1
Share this Question
Share on Google+
9 Replies


P: n/a
> This is a question about the performance trade-off between two methods
of storing data. One method is to create a separate attribute for each
piece of data, and get/set the value according to its name.


"Attributes of Objects" are essentially dictionaries. Access to elements of
dictionaries is O(1), meaning: not depending on the length of the
dictionary. (assuming there are no hash collisions)

You should try to compare performace with numeric indizes to Tuples instead
of lists, cause tuples are invariant they may be faster.

Jul 18 '05 #2

P: n/a
In article <24**************************@posting.google.com >,
Frank Millman <fr***@chagford.com> wrote:

This is a question about the performance trade-off between two methods
of storing data. One method is to create a separate attribute for each
piece of data, and get/set the value according to its name. Another
method is to create a list, and get/set the value according to its
position in the list. In my limited tests, the first method is quicker.


That doesn't make any sense. While lists and dicts are both essentially
O(1) for access, the constant is larger for dicts. Please show us your
testing code.

OTOH, there's certainly nothing wrong with using dicts; they are the
single most fundamental Python data type after strings. (All Python
name/attribute access is built on top of dicts.)
--
Aahz (aa**@pythoncraft.com) <*> http://www.pythoncraft.com/

"The joy of coding Python should be in seeing short, concise, readable
classes that express a lot of action in a small amount of clear code --
not in reams of trivial code that bores the reader to death." --GvR
Jul 18 '05 #3

P: n/a
> Frank Millman <fr***@chagford.com> wrote:

This is a question about the performance trade-off between two methods
of storing data. One method is to create a separate attribute for each
piece of data, and get/set the value according to its name. Another
method is to create a list, and get/set the value according to its
position in the list. In my limited tests, the first method is quicker.

Aahz <aa**@pythoncraft.com> wrote:
That doesn't make any sense. While lists and dicts are both essentially
O(1) for access, the constant is larger for dicts. Please show us your
testing code.
Code shown below, but here is a summary -

print f.getval_from_attr()
print f.getval_from_list()
print f.setval_in_attr(1000)
print f.setval_in_list(1000)

Results, to 2 decimal places, are as follows -

0.32
0.45
0.34
0.44
OTOH, there's certainly nothing wrong with using dicts; they are the
single most fundamental Python data type after strings. (All Python
name/attribute access is built on top of dicts.)


This is exactly my point, but I may be misunderstanding how it works
internally. If I have a class with 20 attributes, it will have a
dictionary with 20 entries. If I have hundreds of instances of the
class, do I end up with hundreds of dictionaries (or the equivalent
internal structures), and if so, does this not lead to memory bloat?
If I store the data in a list, there will only be one dictionary entry
per instance. On the other hand, I suppose there must be some
reference to each of the elements of each list, so maybe there is an
equivalent overhead. Is this a valid concern, or am I worrying about
nothing?

Thanks for your input. Test program is shown below.

Frank

-------------------------------------

from time import time

class frank:
def __init__(self,a,b,c,l):
self.a = a # an attribute
self.b = b # an attribute
self.c = c # an attribute
self.l = l # a list

def getval_from_attr(self):
start = time()
for i in xrange(1000000):
aa = self.a
return (time() - start)

def getval_from_list(self):
start = time()
for i in xrange(1000000):
aa = self.l[0]
return (time() - start)

def setval_in_attr(self,val):
start = time()
for i in xrange(1000000):
self.a = val
return (time() - start)

def setval_in_list(self,val):
start = time()
for i in xrange(1000000):
self.l[0] = val
return (time() - start)

f = frank(100,200,300,[100,200,300])
print f.getval_from_attr()
print f.getval_from_list()
print f.setval_in_attr(1000)
print f.setval_in_list(1000)

-------------------------------------

Results, to 2 decimal places, are as follows -

0.32
0.45
0.34
0.44
Jul 18 '05 #4

P: n/a
Frank Millman wrote:

This is exactly my point, but I may be misunderstanding how it works
internally. If I have a class with 20 attributes, it will have a
dictionary with 20 entries. If I have hundreds of instances of the
class, do I end up with hundreds of dictionaries (or the equivalent
internal structures), and if so, does this not lead to memory bloat?
If I store the data in a list, there will only be one dictionary entry
per instance. On the other hand, I suppose there must be some
reference to each of the elements of each list, so maybe there is an
equivalent overhead.


20 entries * 100 instances = 2000 entries in 100 dictionaries
20 entries * 100 instances = 2000 list indexes in 100 lists

I really don't expect a big memory difference in normal use. I
personally would not worry about it until I actually experienced a
performance issue. Even before you code your app you know the size of
database you care about: why not simulate it and see how the performance is?

Paul Prescod

Jul 18 '05 #5

P: n/a

"Frank Millman" <fr***@chagford.com> wrote in message
news:24**************************@posting.google.c om...
Frank Millman <fr***@chagford.com> wrote:

This is a question about the performance trade-off between two methods
of storing data. One method is to create a separate attribute for each
piece of data, and get/set the value according to its name. Another
method is to create a list, and get/set the value according to its
position in the list. In my limited tests, the first method is quicker.

Aahz <aa**@pythoncraft.com> wrote:

That doesn't make any sense. While lists and dicts are both essentially
O(1) for access, the constant is larger for dicts. Please show us your
testing code.

Code shown below, but here is a summary -

print f.getval_from_attr()
print f.getval_from_list()
print f.setval_in_attr(1000)
print f.setval_in_list(1000)

Results, to 2 decimal places, are as follows -

0.32
0.45
0.34
0.44
OTOH, there's certainly nothing wrong with using dicts; they are the
single most fundamental Python data type after strings. (All Python
name/attribute access is built on top of dicts.)


This is exactly my point, but I may be misunderstanding how it works
internally. If I have a class with 20 attributes, it will have a
dictionary with 20 entries. If I have hundreds of instances of the
class, do I end up with hundreds of dictionaries (or the equivalent
internal structures), and if so, does this not lead to memory bloat?
If I store the data in a list, there will only be one dictionary entry
per instance. On the other hand, I suppose there must be some
reference to each of the elements of each list, so maybe there is an
equivalent overhead. Is this a valid concern, or am I worrying about
nothing?


I believe the memory overhead for dictionaries is 4 times that for
lists on a per item basis, however both data structures allocate
extra space so that they don't have to reallocate too frequently
(a very expensive operation.)

Thanks for your input. Test program is shown below.

Frank

Jul 18 '05 #6

P: n/a
In article <10*************@news.supernews.com>,
John Roth <ne********@jhrothjr.com> wrote:

I believe the memory overhead for dictionaries is 4 times that for
lists on a per item basis, however both data structures allocate
extra space so that they don't have to reallocate too frequently
(a very expensive operation.)


Should be only double; where do you get four times? (Note that I'm
talking strictly about the dict itself, not the space consumed by the
keys.)
--
Aahz (aa**@pythoncraft.com) <*> http://www.pythoncraft.com/

"The joy of coding Python should be in seeing short, concise, readable
classes that express a lot of action in a small amount of clear code --
not in reams of trivial code that bores the reader to death." --GvR
Jul 18 '05 #7

P: n/a
In article <24**************************@posting.google.com >,
Frank Millman <fr***@chagford.com> wrote:
Aahz <aa**@pythoncraft.com> wrote:

OTOH, there's certainly nothing wrong with using dicts; they are the
single most fundamental Python data type after strings. (All Python
name/attribute access is built on top of dicts.)
This is exactly my point, but I may be misunderstanding how it works
internally. If I have a class with 20 attributes, it will have a
dictionary with 20 entries. If I have hundreds of instances of the
class, do I end up with hundreds of dictionaries (or the equivalent
internal structures), and if so, does this not lead to memory bloat?
If I store the data in a list, there will only be one dictionary entry
per instance. On the other hand, I suppose there must be some reference
to each of the elements of each list, so maybe there is an equivalent
overhead. Is this a valid concern, or am I worrying about nothing?


You're mostly worrying about nothing; your analysis is essentially
correct.
def getval_from_list(self):
start = time()
for i in xrange(1000000):
aa = self.l[0]
return (time() - start)


Try this instead:

def getval_from_list(self):
l = self.l
start = time()
for i in xrange(1000000):
aa = l[0]
return (time() - start)

Your original code pays the penalty for *both* dict/attribute lookup and
list lookup.
--
Aahz (aa**@pythoncraft.com) <*> http://www.pythoncraft.com/

"The joy of coding Python should be in seeing short, concise, readable
classes that express a lot of action in a small amount of clear code --
not in reams of trivial code that bores the reader to death." --GvR
Jul 18 '05 #8

P: n/a
aa**@pythoncraft.com (Aahz) writes:
In article <10*************@news.supernews.com>,
John Roth <ne********@jhrothjr.com> wrote:

I believe the memory overhead for dictionaries is 4 times that for
lists on a per item basis, however both data structures allocate
extra space so that they don't have to reallocate too frequently
(a very expensive operation.)


Should be only double; where do you get four times? (Note that I'm
talking strictly about the dict itself, not the space consumed by the
keys.)


The hash is stored in the dictentry struct, which gets you to three
times; I suppose it's possible alignment would get you four but I
don't think so.

Cheers,
mwh

--
I have no disaster recovery plan for black holes, I'm afraid. Also
please be aware that if it one looks imminent I will be out rioting
and setting fire to McDonalds (always wanted to do that) and
probably not reading email anyway. -- Dan Barlow
Jul 18 '05 #9

P: n/a
aa**@pythoncraft.com (Aahz) wrote in message:

You're mostly worrying about nothing; your analysis is essentially
correct.


Thanks a lot, everybody, for the replies. I will proceed on the
assumption that I do not need to worry about this.

I have to change a lot of things in my app if I switch from lists to
attributes, and I was nervous of doing all the changes and then
finding that I had a problem. I am now reasonably confident that I
will not have a problem, and the code should be cleaner and faster
when I have finished, so I will get stuck in.

Thanks again

Frank
Jul 18 '05 #10

This discussion thread is closed

Replies have been disabled for this discussion.