472,986 Members | 3,007 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,986 software developers and data experts.

Any problems with *lots* of attributes?

Hi all

This is a question about the performance trade-off between two methods
of storing data. One method is to create a separate attribute for each
piece of data, and get/set the value according to its name. Another
method is to create a list, and get/set the value according to its
position in the list. In my limited tests, the first method is
quicker.

My application is database-intensive. I have a Table class which is
instantiated for each table that is opened. I have a Column class
which is instantiated for each column in each table. The number of
attributes for each Column can be up to about 20. It is possible for
the number of column instances at any one time to run into the
hundreds, therefore the total number of attributes can run into
thousands.

At the moment I store the data for each column in a single Column
attribute, as a list. If I switch to creating a separate Column
attribute for each piece of data, is there a danger that the number of
attributes may eventually reach a point where any performance gain is
negated or even reversed?

Any advice will be much appreciated.

Frank Millman
Jul 18 '05 #1
9 1160
> This is a question about the performance trade-off between two methods
of storing data. One method is to create a separate attribute for each
piece of data, and get/set the value according to its name.


"Attributes of Objects" are essentially dictionaries. Access to elements of
dictionaries is O(1), meaning: not depending on the length of the
dictionary. (assuming there are no hash collisions)

You should try to compare performace with numeric indizes to Tuples instead
of lists, cause tuples are invariant they may be faster.

Jul 18 '05 #2
In article <24**************************@posting.google.com >,
Frank Millman <fr***@chagford.com> wrote:

This is a question about the performance trade-off between two methods
of storing data. One method is to create a separate attribute for each
piece of data, and get/set the value according to its name. Another
method is to create a list, and get/set the value according to its
position in the list. In my limited tests, the first method is quicker.


That doesn't make any sense. While lists and dicts are both essentially
O(1) for access, the constant is larger for dicts. Please show us your
testing code.

OTOH, there's certainly nothing wrong with using dicts; they are the
single most fundamental Python data type after strings. (All Python
name/attribute access is built on top of dicts.)
--
Aahz (aa**@pythoncraft.com) <*> http://www.pythoncraft.com/

"The joy of coding Python should be in seeing short, concise, readable
classes that express a lot of action in a small amount of clear code --
not in reams of trivial code that bores the reader to death." --GvR
Jul 18 '05 #3
> Frank Millman <fr***@chagford.com> wrote:

This is a question about the performance trade-off between two methods
of storing data. One method is to create a separate attribute for each
piece of data, and get/set the value according to its name. Another
method is to create a list, and get/set the value according to its
position in the list. In my limited tests, the first method is quicker.

Aahz <aa**@pythoncraft.com> wrote:
That doesn't make any sense. While lists and dicts are both essentially
O(1) for access, the constant is larger for dicts. Please show us your
testing code.
Code shown below, but here is a summary -

print f.getval_from_attr()
print f.getval_from_list()
print f.setval_in_attr(1000)
print f.setval_in_list(1000)

Results, to 2 decimal places, are as follows -

0.32
0.45
0.34
0.44
OTOH, there's certainly nothing wrong with using dicts; they are the
single most fundamental Python data type after strings. (All Python
name/attribute access is built on top of dicts.)


This is exactly my point, but I may be misunderstanding how it works
internally. If I have a class with 20 attributes, it will have a
dictionary with 20 entries. If I have hundreds of instances of the
class, do I end up with hundreds of dictionaries (or the equivalent
internal structures), and if so, does this not lead to memory bloat?
If I store the data in a list, there will only be one dictionary entry
per instance. On the other hand, I suppose there must be some
reference to each of the elements of each list, so maybe there is an
equivalent overhead. Is this a valid concern, or am I worrying about
nothing?

Thanks for your input. Test program is shown below.

Frank

-------------------------------------

from time import time

class frank:
def __init__(self,a,b,c,l):
self.a = a # an attribute
self.b = b # an attribute
self.c = c # an attribute
self.l = l # a list

def getval_from_attr(self):
start = time()
for i in xrange(1000000):
aa = self.a
return (time() - start)

def getval_from_list(self):
start = time()
for i in xrange(1000000):
aa = self.l[0]
return (time() - start)

def setval_in_attr(self,val):
start = time()
for i in xrange(1000000):
self.a = val
return (time() - start)

def setval_in_list(self,val):
start = time()
for i in xrange(1000000):
self.l[0] = val
return (time() - start)

f = frank(100,200,300,[100,200,300])
print f.getval_from_attr()
print f.getval_from_list()
print f.setval_in_attr(1000)
print f.setval_in_list(1000)

-------------------------------------

Results, to 2 decimal places, are as follows -

0.32
0.45
0.34
0.44
Jul 18 '05 #4
Frank Millman wrote:

This is exactly my point, but I may be misunderstanding how it works
internally. If I have a class with 20 attributes, it will have a
dictionary with 20 entries. If I have hundreds of instances of the
class, do I end up with hundreds of dictionaries (or the equivalent
internal structures), and if so, does this not lead to memory bloat?
If I store the data in a list, there will only be one dictionary entry
per instance. On the other hand, I suppose there must be some
reference to each of the elements of each list, so maybe there is an
equivalent overhead.


20 entries * 100 instances = 2000 entries in 100 dictionaries
20 entries * 100 instances = 2000 list indexes in 100 lists

I really don't expect a big memory difference in normal use. I
personally would not worry about it until I actually experienced a
performance issue. Even before you code your app you know the size of
database you care about: why not simulate it and see how the performance is?

Paul Prescod

Jul 18 '05 #5

"Frank Millman" <fr***@chagford.com> wrote in message
news:24**************************@posting.google.c om...
Frank Millman <fr***@chagford.com> wrote:

This is a question about the performance trade-off between two methods
of storing data. One method is to create a separate attribute for each
piece of data, and get/set the value according to its name. Another
method is to create a list, and get/set the value according to its
position in the list. In my limited tests, the first method is quicker.

Aahz <aa**@pythoncraft.com> wrote:

That doesn't make any sense. While lists and dicts are both essentially
O(1) for access, the constant is larger for dicts. Please show us your
testing code.

Code shown below, but here is a summary -

print f.getval_from_attr()
print f.getval_from_list()
print f.setval_in_attr(1000)
print f.setval_in_list(1000)

Results, to 2 decimal places, are as follows -

0.32
0.45
0.34
0.44
OTOH, there's certainly nothing wrong with using dicts; they are the
single most fundamental Python data type after strings. (All Python
name/attribute access is built on top of dicts.)


This is exactly my point, but I may be misunderstanding how it works
internally. If I have a class with 20 attributes, it will have a
dictionary with 20 entries. If I have hundreds of instances of the
class, do I end up with hundreds of dictionaries (or the equivalent
internal structures), and if so, does this not lead to memory bloat?
If I store the data in a list, there will only be one dictionary entry
per instance. On the other hand, I suppose there must be some
reference to each of the elements of each list, so maybe there is an
equivalent overhead. Is this a valid concern, or am I worrying about
nothing?


I believe the memory overhead for dictionaries is 4 times that for
lists on a per item basis, however both data structures allocate
extra space so that they don't have to reallocate too frequently
(a very expensive operation.)

Thanks for your input. Test program is shown below.

Frank

Jul 18 '05 #6
In article <10*************@news.supernews.com>,
John Roth <ne********@jhrothjr.com> wrote:

I believe the memory overhead for dictionaries is 4 times that for
lists on a per item basis, however both data structures allocate
extra space so that they don't have to reallocate too frequently
(a very expensive operation.)


Should be only double; where do you get four times? (Note that I'm
talking strictly about the dict itself, not the space consumed by the
keys.)
--
Aahz (aa**@pythoncraft.com) <*> http://www.pythoncraft.com/

"The joy of coding Python should be in seeing short, concise, readable
classes that express a lot of action in a small amount of clear code --
not in reams of trivial code that bores the reader to death." --GvR
Jul 18 '05 #7
In article <24**************************@posting.google.com >,
Frank Millman <fr***@chagford.com> wrote:
Aahz <aa**@pythoncraft.com> wrote:

OTOH, there's certainly nothing wrong with using dicts; they are the
single most fundamental Python data type after strings. (All Python
name/attribute access is built on top of dicts.)
This is exactly my point, but I may be misunderstanding how it works
internally. If I have a class with 20 attributes, it will have a
dictionary with 20 entries. If I have hundreds of instances of the
class, do I end up with hundreds of dictionaries (or the equivalent
internal structures), and if so, does this not lead to memory bloat?
If I store the data in a list, there will only be one dictionary entry
per instance. On the other hand, I suppose there must be some reference
to each of the elements of each list, so maybe there is an equivalent
overhead. Is this a valid concern, or am I worrying about nothing?


You're mostly worrying about nothing; your analysis is essentially
correct.
def getval_from_list(self):
start = time()
for i in xrange(1000000):
aa = self.l[0]
return (time() - start)


Try this instead:

def getval_from_list(self):
l = self.l
start = time()
for i in xrange(1000000):
aa = l[0]
return (time() - start)

Your original code pays the penalty for *both* dict/attribute lookup and
list lookup.
--
Aahz (aa**@pythoncraft.com) <*> http://www.pythoncraft.com/

"The joy of coding Python should be in seeing short, concise, readable
classes that express a lot of action in a small amount of clear code --
not in reams of trivial code that bores the reader to death." --GvR
Jul 18 '05 #8
aa**@pythoncraft.com (Aahz) writes:
In article <10*************@news.supernews.com>,
John Roth <ne********@jhrothjr.com> wrote:

I believe the memory overhead for dictionaries is 4 times that for
lists on a per item basis, however both data structures allocate
extra space so that they don't have to reallocate too frequently
(a very expensive operation.)


Should be only double; where do you get four times? (Note that I'm
talking strictly about the dict itself, not the space consumed by the
keys.)


The hash is stored in the dictentry struct, which gets you to three
times; I suppose it's possible alignment would get you four but I
don't think so.

Cheers,
mwh

--
I have no disaster recovery plan for black holes, I'm afraid. Also
please be aware that if it one looks imminent I will be out rioting
and setting fire to McDonalds (always wanted to do that) and
probably not reading email anyway. -- Dan Barlow
Jul 18 '05 #9
aa**@pythoncraft.com (Aahz) wrote in message:

You're mostly worrying about nothing; your analysis is essentially
correct.


Thanks a lot, everybody, for the replies. I will proceed on the
assumption that I do not need to worry about this.

I have to change a lot of things in my app if I switch from lists to
attributes, and I was nervous of doing all the changes and then
finding that I had a problem. I am now reasonably confident that I
will not have a problem, and the code should be cleaner and faster
when I have finished, so I will get stuck in.

Thanks again

Frank
Jul 18 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Gil Tal | last post by:
Hi, I use urllib2 to download a redirected url and I get an exception from the bowels of urllib2. It seems that urllib2 implements some super sophisticated self check and tries to control the...
2
by: Brian | last post by:
NOTE ALSO POSTED IN microsoft.public.dotnet.framework.aspnet.buildingcontrols I have solved most of my Server Control Collection property issues. I wrote an HTML page that describes all of the...
6
by: Jack | last post by:
Hi there, I'm not sure if this the appropriate group so apologies if it lies outside the boundary. Senario: I have a customer table with contains a bunch of different bit values that represent...
4
by: Navu | last post by:
hi i have written a code in javascript and this code create lots of images at run time. Whenever page loads all these images get downloaded as 90% of images are same So is there any way so that...
17
by: Neil Cerutti | last post by:
The Glk API (which I'm implementing in native Python code) defines 120 or so constants that users must use. The constants already have fairly long names, e.g., gestalt_Version, evtype_Timer,...
3
by: Jayme Pechan | last post by:
I'm trying to do a simple web post to a web server with C# and am having lots of problems. Here is the code: try { XmlDocument docRequest = new XmlDocument(); XmlNode nodePush =...
6
by: Charlie Bear | last post by:
i'm really stuck with this one can anyone help! i have a website that uses c#. it creates a series of directories and files from an xml source. when the xml changes, the directory that the...
409
by: jacob navia | last post by:
I am trying to compile as much code in 64 bit mode as possible to test the 64 bit version of lcc-win. The problem appears now that size_t is now 64 bits. Fine. It has to be since there are...
5
by: Peter Otten | last post by:
Urizev wrote: Do you see it now I snipped the irrelevant output? The problem is the structure of your program. The myset module is imported twice by Python, once as "myset" and once as...
0
by: lllomh | last post by:
Define the method first this.state = { buttonBackgroundColor: 'green', isBlinking: false, // A new status is added to identify whether the button is blinking or not } autoStart=()=>{
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 4 Oct 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
0
tracyyun
by: tracyyun | last post by:
Hello everyone, I have a question and would like some advice on network connectivity. I have one computer connected to my router via WiFi, but I have two other computers that I want to be able to...
2
by: giovanniandrean | last post by:
The energy model is structured as follows and uses excel sheets to give input data: 1-Utility.py contains all the functions needed to calculate the variables and other minor things (mentions...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 1 Nov 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM) Please note that the UK and Europe revert to winter time on...
3
by: nia12 | last post by:
Hi there, I am very new to Access so apologies if any of this is obvious/not clear. I am creating a data collection tool for health care employees to complete. It consists of a number of...
0
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be focusing on the Report (clsReport) class. This simply handles making the calling Form invisible until all of the Reports opened by it have been closed, when it...
0
isladogs
by: isladogs | last post by:
The next online meeting of the Access Europe User Group will be on Wednesday 6 Dec 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, Mike...
4
by: GKJR | last post by:
Does anyone have a recommendation to build a standalone application to replace an Access database? I have my bookkeeping software I developed in Access that I would like to make available to other...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.