By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,347 Members | 1,390 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,347 IT Pros & Developers. It's quick & easy.

Data objects vs. speed

P: n/a
I have to create a big web application and I was thinking of using a
data layer. For each entity in the database, I'll define a class that
maps the table structure, having sub-objects for each foreign key,
having insert/delete/update methods, the usual deal. Yet, I am very
concerned about performance. For example, there are lots of cases when
I may just be needing the employee name. Yet using this model, I will
have to instantiate an entire Employee class, which may have sub-
objects and use lots of SELECT queries. I think this will be an
important performance hit. What do you think? Ever did something like
this? Is it worthy? Is there a better way? Should I give up creating
the data layer?

Feb 19 '07 #1
Share this Question
Share on Google+
8 Replies


P: n/a
I have just implemented something similar.

You can structure the SELECTs in such a way that they are only run if
the relavent information is requested from the class via a get method.
For instance, if any of the methods requesting user information are
called, a SELECT is run which gets all user information (since all
info is on the same row - might as well get it all). Any further
requests then give up this cached info rather than another SELECT.

However, I share your concern with having to instantiate a large class
for even simple queries. I am a newbie at PHP (C++ coder by trade) and
am only now reading about the performance overheard of function calls
and the fact that PHP will have to parse the entire class code.

Is it worth the trade off? I don't know. I has certainly made the
coding a whole lot easier and more structured. I am now implementing
cacheing certain data in the session, and this is a snip as everything
is accessed via the same class.

I will read this thread with interest.


Feb 19 '07 #2

P: n/a
rp******@yahoo.com wrote:
I have to create a big web application and I was thinking of using a
data layer. For each entity in the database, I'll define a class that
maps the table structure, having sub-objects for each foreign key,
having insert/delete/update methods, the usual deal. Yet, I am very
concerned about performance. For example, there are lots of cases when
I may just be needing the employee name. Yet using this model, I will
have to instantiate an entire Employee class, which may have sub-
objects and use lots of SELECT queries. I think this will be an
important performance hit. What do you think? Ever did something like
this? Is it worthy? Is there a better way? Should I give up creating
the data layer?
I do objects in all my more complex sites. But I generally use business
objects - ones related to the job at hand. Any business object may be
related to a few columns in one table, most columns across multiple
tables, or anything in between.

And I may even use the same object, but have different fetch methods
which retrieve only partial data.

For instance, if I have:

table student
id int
name char(30)
other stuff

table class
id int
name char
instructor_id int
other stuff

table class_student
class_id int
student_id int

I might have a class which will retrieve all the student names, the
instructor name, day(s) and start/s for a single class, the room in
which it will be held, etc.. Alternatively, it might only retrieve a
count of the number of students in the class

If the data is getting more complex, I may add a pure data layer under
the business object layer, where I do have one table per object. But
this layer will also have multiple functions to allow me to retrieve
what I need, when I need it.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================
Feb 19 '07 #3

P: n/a
lister wrote:
I have just implemented something similar.

You can structure the SELECTs in such a way that they are only run if
the relavent information is requested from the class via a get method.
For instance, if any of the methods requesting user information are
called, a SELECT is run which gets all user information (since all
info is on the same row - might as well get it all). Any further
requests then give up this cached info rather than another SELECT.

However, I share your concern with having to instantiate a large class
for even simple queries. I am a newbie at PHP (C++ coder by trade) and
am only now reading about the performance overheard of function calls
and the fact that PHP will have to parse the entire class code.

Is it worth the trade off? I don't know. I has certainly made the
coding a whole lot easier and more structured. I am now implementing
cacheing certain data in the session, and this is a snip as everything
is accessed via the same class.

I will read this thread with interest.

In a big class where you need lots of data, this would be a performance
disaster. Some of my tables have 40+ columns - and making 40+ SELECT
statements, each fetching one item, is a tremendous waste of time.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================
Feb 19 '07 #4

P: n/a
In a big class where you need lots of data, this would be a performance
disaster. Some of my tables have 40+ columns - and making 40+ SELECT
statements, each fetching one item, is a tremendous waste of time.
I think you misunderstand me. There is only one select statement per
table. Yes, I could have 40 columns, and 40 different get*() methods,
but only the first call to one of those methods would result in a
SELECT *. Then all 40 pieces of information would be populated in the
class, and any subsequent get*() calls would simply return the value.

Feb 19 '07 #5

P: n/a
lister wrote:
>In a big class where you need lots of data, this would be a performance
disaster. Some of my tables have 40+ columns - and making 40+ SELECT
statements, each fetching one item, is a tremendous waste of time.

I think you misunderstand me. There is only one select statement per
table. Yes, I could have 40 columns, and 40 different get*() methods,
but only the first call to one of those methods would result in a
SELECT *. Then all 40 pieces of information would be populated in the
class, and any subsequent get*() calls would simply return the value.
OK, that's better. But it's also a lot of unnecessary overhead to fetch
40 columns if you only need 3.

Additionally, you should never use SELECT * - put in the names of the
columns instead. For instance, what happens if someone adds a 10Mb BLOB
field to your table and you SELECT *?

As I indicated above, a better compromise between getting everything and
getting just one field is to have more than one fetch method to allow
fetching of multiple columns.

For instance, on one customer I have one table (companies they sell to)
with 42 columns (and yes, it is normalized). One fetch might get
company name and id. Another might get this plus address and phone
info. A third one might get all of this plus current financial info.

And since this is a business object, I could have a method which fetches
outstanding transactions. Another can fetch all transactions. These
and others like them link to other tables - but that is transparent to
the program itself.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================
Feb 19 '07 #6

P: n/a
rpsetzer wrote:
Yet, I am very concerned about performance. For example, there are lots
of cases when I may just be needing the employee name.
The approach I'm using in a current project is along these lines:

public function load_from_database($keys) {...}
public function load_from_array($vals) {...}

Such that the class can be populated from either the database, or from an
array. It can be partially populated, so that, for example, you can load
the Employee ID and Employee Name from an array like so:

$e = new Employee();
$e->load_from_array(array('id'=>1, 'name'=>'Joe'));

where the other fields like address, telephone number, etc would remain
blank within the object, but if they were asked for:

print $e->get('phone');

then the Employee object is smart enough to grab all the missing data from
the database and return the appropriate value.

So for example:

$db = new PDO(...);
foreach ($db->query('SELECT id, name FROM employees') as $row)
{
$e = new Employee();
$e->load_from_array($row);
print $e->get('name');
if ($e->get('name')=='Joe')
print ' '.$e->get('phone');
print "\n";
}

will work as expected, printing a list of employees, including phone
numbers for employees called Joe. Yes, that's right -- even when the
initial query didn't select phone numbers!

When you perform the get('phone') call, $e realises that 'phone' is NULL,
so performs its own call to load_from_database, which runs 'SELECT * FROM
employees WHERE id={$this->id}' and then feeds the results into
load_from_array.

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact
Geek of ~ HTML/SQL/Perl/PHP/Python*/Apache/Linux

* = I'm getting there!
Feb 19 '07 #7

P: n/a
Hi,

Of course it depends on what you need to do and how much processing
power you have at hand. For some purposes using a relational database
may itself be a bad idea. But for most apps data objects can work fine,
if you implement caching. When i developed the first version of
phpPeanuts i wanted it to be the simpelest thing that could possebly
work so i did not include caching. But when i tested my first serious
app with it, caching was the first thing to be added. Without it it
simply did not perform. With caching the app only fired 1/4th to 1/10th
of the queries, this solved the performance problem. I implemented
caching in the framework and i have not yet had to develop an app for
wchich it was not sufficient.

To implement caching each object type needs to have a key. For each
object that is retrieved from the database a reference* is added to the
cache, using its key as key in an associative array. Associative arrays
are great for caching because they use hashed lookup to quickly retrieve
objects. With hashed lookup a there is very little searching, and the
searching does not increase substantially with large arrays.

When navigating over a '1-n' relationship in the '1' direction you have
the (now foreign) key, so you first look in the cache to see if the
object has already been loaded. If it has not, you load it and cache it
and return a reference. This mechanism is so fast that it is not needed
to store the reference to a related object in a member variable of the
relating object. This saves you a big headache with circular references
that tend to make php unstable.

When navigating over an 1-n relationship in the n direction the chache
is of little help**. Therefore it is generally a good idea to store each
array of objects you retrieve in a member variable of the relating object.

BTW, you do not have to implement insert/delete/update methods for each
object type, with metadata you can implement generic insert, delete and
update methods so you only need three methods however many types you
make. Or you could download a framework that does all this ;-).

Greetings,

Henk Verhoeven,
www.phpPeanuts.org.

BTW phpPeanuts does not work if you need arbitrary keys. It allways
needs 'id' to be the key. Simpelest thing...

* in php5 it is not necessary to think about references, but in php4 you
may get a subtantial performance gain by using variable references here.
** However, if an object is stored in multiple tables but you do not
know in advance in which ones (like with polymorphism) the cache may
save you an extra query to retrieve additional data.
Feb 20 '07 #8

P: n/a

<rp******@yahoo.comwrote in message
news:11**********************@v33g2000cwv.googlegr oups.com...
>I have to create a big web application and I was thinking of using a
data layer. For each entity in the database, I'll define a class that
maps the table structure, having sub-objects for each foreign key,
having insert/delete/update methods, the usual deal. Yet, I am very
concerned about performance. For example, there are lots of cases when
I may just be needing the employee name. Yet using this model, I will
have to instantiate an entire Employee class, which may have sub-
objects and use lots of SELECT queries. I think this will be an
important performance hit. What do you think? Ever did something like
this? Is it worthy? Is there a better way? Should I give up creating
the data layer?
Having a single class for each database table is a good idea - I have been
using it for years. The notion of having a subclass for each foreign key is
a definite no-no. It is totally unnecessary and a waste of time. Nor do you
need a separate class method for each possible SELECT ... WHERE ... as it is
possible to have a generic getData($where) method where the $where argument
is a string which is provided at runtime. This can cover all eventualities.

It does not matter that you have to instantiate the EMPLOYEE class
containing 40 columns if you only want a single column. In my methodology
the default is SELECT *, but a specific list of column names can be provided
for individual queries if required. This removes the need for a different
class method for each combination of SELECT ... and WHERE ...

One thin that you should notice is that 95% of the code in a database table
class is common to all database tables, therefore this common code can be
put into a superclass and inherited by individual table classes. This means
that each individual table class need contain no more than the following:
(1) database engine name
(2) database name
(3) table name
(4) database structure (column names, primary key names, candidate key
names, relationships with other tables)
(5) custom business rules

If you are really clever you can put your database APIs in a separate class
so that you can switch database engines (MySQL, PostgreSQL or Oracle) by
changing a single line of code.

The bottom line is that if you use OO classes intelligently you can save a
lot of repetitive coding, must don't waste time trying to create a complex
class hierarchy - it just ain't worth it.

--
Tony Marston
http://www.tonymarston.net
http://www.radicore.org
Feb 20 '07 #9

This discussion thread is closed

Replies have been disabled for this discussion.