473,395 Members | 2,443 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

Large Data Sets: Use base variables or classes? And some bindingquestions

Hello.

I will be using some large data sets ("points" from 2 to 12 variables)
and would like to use one class for each point rather than a list or
dictionary. I imagine this is terribly inefficient, but how much?

What is the cost of creating a new class?

What is the cost of referencing a class variable?

What is the cost of calling a class method to just return a variable?

Key point: The point objects, once created, and essentially non-
mutable. Static. Is there a way to "bind" a variable to a object
method in a way that is more efficient than the function calling
self.variable_name ?

I'll run some profile tests later today but if anyone has any cost/
efficiency of object creation in python, or any other idioms related
to variable creation, I'd greatly appreciate some links.

Thanks!

Patrick
Sep 26 '08 #1
6 1713
On 26 Sep, 16:39, Patrick Sullivan <psu...@gmail.comwrote:
Hello.

I will be using some large data sets ("points" from 2 to 12 variables)
and would like to use one class for each point rather than a list or
dictionary. I imagine this is terribly inefficient, but how much?
I can't really get into details here, but I would suggest that you go
ahead and try first. As you know, premature optimization is the root
of all evil.

General points I would suggest:

- Use Numpy/Scipy (http://www.scipy.org). You will have more
effeciency easier than if you try to use simply Python lists. And it
is much easier to later optimize that.
- Your questions of referencing classes and variables tell me that
perhaps you are starting from a C background, or Java maybe? Anyway,
as far as I know, it is not standard practice to write a class method
(you meant a normal bound method, right?) just to access a variable.
Use a normal Python variable and if you need to make it a method later
turn it into a property.
- Is the efficiency you are looking for is in terms of time or memory?
That difference leads to different optimization tricks sometimes.
- By using Numpy there is probably another advantage to you: some
efficiency in the data representation, as the NumPy array stores data,
say integers, without memory overhead per member (point). Just an
array of integers. Of course there is additional constant memory per
array which is independent of the number of elements (points) you are
storing.
- Generally try to think in terms of arrays of data rather than single
points. If it helps, think in terms of matrices. That is more or less
the design of Matlab, and Numpy is more or less similar.
Now if you specify your problem further I am sure that you will get
better advice from the community here. Don't focus on the details,
probably the bigger picture will help. Working in graphics? Image
processing? Machine Learning/Statistics/Data Mining/ etc..?

--
Muhammad Alkarouri
Sep 26 '08 #2
Patrick Sullivan wrote:
Hello.

I will be using some large data sets ("points" from 2 to 12 variables)
and would like to use one class for each point rather than a list or
dictionary. I imagine this is terribly inefficient, but how much?
I strongly suspect that you should use one class and a class instance
for each 'point'. You can make instances 'fixed' after initialization
by customizing appropriate methods, but I would not bother for private code.

Sep 26 '08 #3
On Sep 26, 11:39*am, Patrick Sullivan <psu...@gmail.comwrote:
Hello.
Hi, I have a couple suggestions.

I will be using some large data sets ("points" from 2 to 12 variables)
and would like to use one class for each point rather than a list or
dictionary.
Ok, point of terminology. It's not really a nit-pick, either, since
it affects some of your questions below. When you say you want to use
one class for each point, you apparently mean you would like to use
one class instance, or one object, for each point.

One class for each point would be terribly inefficient; one instance,
perhaps not.

I imagine this is terribly inefficient, but how much?
You say large data sets, which suggests that __slots__ mechanism could
be useful to you.

class A(object):
__slots__ = ['var1','var2','var3']

Normally, each class instance has an associated dict which stores the
attributes, but if you define __slots__ then the variables will be
stored in fixed memory locations and no dict will be created.

However, it seems from the rest of your comments that speed is your
main concern. Last time someone reported __slots__ didn't make a big
difference in access time, but it probably would speed up creating
objects a bit. Of course, you should profile it to make sure.

What is the cost of creating a new class?
I'm assuming you want to know the cost of creating a class instance.
Generally speaking, the main cost of this is that you'd be executing
Python code (whereas list and dict are written in C).

What is the cost of referencing a class variable?
I assume you mean an instance variable.

What is the cost of calling a class method to just return a variable?
Significant penalty.

This is because even if the method call is faster (and I doubt very
highly that it is), the method still has to access the variable, which
is going to take the same amount of time as accessing the variable
directly. I.e., you're getting the overhead of a method call to do
the same thing you could have done directly.

I highly recommend against doing this, not only because it's less
efficient, but also because it's considered bad style in Python.

Key point: The point objects, once created, and essentially non-
mutable. Static. Is there a way to "bind" a variable to a object
method in a way that is more efficient than the function calling
self.variable_name ?
Python 2.6 has a new object type called namedtuple in the collections
module. (Actually it's a type factory that creates a subclass of
tuple with attribute names mapped to the indices.) This might be a
perfect fit for your needs. You have to upgrade to 2.6, though, which
won't be released for a few days.
Carl Banks

Sep 26 '08 #4
On Fri, 26 Sep 2008 14:54:36 -0700, Carl Banks wrote:
However, it seems from the rest of your comments that speed is your main
concern. Last time someone reported __slots__ didn't make a big
difference in access time, but it probably would speed up creating
objects a bit.
Carl probably knows this already, but for the benefit of the Original
Poster:

__slots__ is intended as a memory optimization, not speed optimization.
If it speeds up creation, that's a serendipitous side-effect of using
less memory.

Of course, you should profile it to make sure.
Absolutely.

Can I ask the OP how large is "large" in the Large Data Sets? What seems
large to people is often not large at all a modern computer.

--
Steven
Sep 26 '08 #5
On Sep 26, 7:43*pm, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.auwrote:
On Fri, 26 Sep 2008 14:54:36 -0700, Carl Banks wrote:
However, it seems from the rest of your comments that speed is your main
concern. *Last time someone reported __slots__ didn't make a big
difference in access time, but it probably would speed up creating
objects a bit. *

Carl probably knows this already, but for the benefit of the Original
Poster:

__slots__ is intended as a memory optimization, not speed optimization.
If it speeds up creation, that's a serendipitous side-effect of using
less memory.
No, it'd be a serendipitous side-effect of not having to take the time
to create a dict object, which is quite a bit more of a direct cause.

It might still end up being slower (creating slot descriptors might
take more time for all I know) but it's more than just an effect of
less memory.

Carl Banks
Sep 27 '08 #6
On Sep 26, 8:53*pm, Carl Banks <pavlovevide...@gmail.comwrote:
It might still end up being slower (creating slot descriptors might
take more time for all I know) but it's more than just an effect of
less memory.
Actually scratch that. Descriptors are only created when the type
object is created. I can't think of anything that would need to be
done in an instance only if no dict is present, so using slots
probably almost certianly makes object creation faster. Still, the
last word is the profiler.
Carl Banks
Sep 27 '08 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

23
by: assaf__ | last post by:
Hello, I am beginning to work on a fairly large project and I'm considering to use python for most of the coding, but I need to make sure first that it is reliable enough. I need to make sure...
0
by: sedefo | last post by:
I ran into this Microsoft Patterns & Practices Enterprise Library while i was researching how i can write a database independent data access layer. In my company we already use Data Access...
9
by: Eric Lilja | last post by:
Hello, consider the following two functions: /* function foo() */ void foo() { float y = 0.0f; float sum = 0.0f; for(int i = 0; i < num; ++i) {
4
by: Oyvind | last post by:
I'm working on a Windows forms/C# database application. My background is 6-7 years of VB 4 - 6, MS Access, VC++, mixed in with a lot of T-SQL and MS SQL Server in general and some OOA/OOD. ...
30
by: Charles Law | last post by:
Here's one that should probably have the sub-heading "I'm sure I asked this once before, but ...". Two users are both looking at the same data, from a database. One user changes the data and...
15
by: CMOS | last post by:
one of the projects im working in currently requires use of ultra large sized maps, lists, vector, etc. (basically stl containers). Sizes might grow up to 1000 Million entries. since it is...
15
by: Bob Johnson | last post by:
I have a base class that must have a member variable populated by, and only by, derived classes. It appears that if I declare the variable as "internal protected" then the base class *can*...
25
by: tekctrl | last post by:
Anyone: I have a simple MSAccess DB which was created from an old ASCII flatfile. It works fine except for something that just started happening. I'll enter info in a record, save the record,...
6
by: Immortal Nephi | last post by:
First class is the base class. It has two data: m_Base1 and m_Base2. Second class and third class are derived classes and they are derived from first class. m_Base1 and m_Base2 are inherited into...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.