473,404 Members | 2,137 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,404 software developers and data experts.

Scalability Code question - PHP vs MySQL

I'm having a tough time figuring out which of these two options are
best. This is a matter of processing my data in PHP, vs MySQL.
Usually that's a no brainer, but I have a couple gotchyas here and
would love any and all opinions here. I'm going to make this as short
and simple as I can...

This is for an e-commerce site with very high traffic, and the choice
will probably not be based on speed, but which is more scalable. I
need this to last. So here's my test code.. you may not know all
these functions, but I think they're very straight forward:

// 2 ways of doing this.. 1 query or more?
$start = microtime(true);
$productSql = "SELECT * FROM $searchTemp $productWhere $sort"; //
searchTemp is a large table of denormalized data
$searchResults = $my->returnTableAssoc($productSql,
$selectFromSlave); // this just returns a multidimensional array of
the results

// this is an array_unique for a multidimensional array and will
essentially be like group_by productid
$products = remove_dups($searchResults, 'productid');
// get the other columns of data needed
$brands = array();
$cats = array();
$colors = array();
$years = array();
$bootWidth = array();
$flex = array();
foreach($searchResults as $sr)
{
$brands[] = $sr['manufacturer'];
$cats[] = $sr['categoryid'];
$colors[] = $sr['colorcode'];
$years[] = $sr['modelYear'];
$bootWidth[] = $sr['bootWidth'];
$flex[] = $sr['flexRating'];
}
$brands = array_unique($brands);
$cats = array_unique($cats);
$colors = array_unique($colors);
$years = array_unique($years);
$bootWidth = array_unique($bootWidth);
$flex = array_unique($flex);
$end = microtime(true);
echo "Did first in " . ($end - $start) . " seconds <br>";

// try again - just do a bunch of queries and let mysql do all the
work
$productSql = "SELECT * FROM $searchTemp $productWhere GROUP BY
productid $sort";
$products = $my->returnTableAssoc($productSql, $selectFromSlave);
$productSql = "SELECT distinct manufacturer FROM $searchTemp
$productWhere";
$brands = $my->returnArray($productSql, $selectFromSlave);
$productSql = "SELECT distinct categoryid FROM $searchTemp
$productWhere";
$cats = $my->returnArray($productSql, $selectFromSlave);
$productSql = "SELECT distinct colorcode FROM $searchTemp
$productWhere";
$colors = $my->returnArray($productSql, $selectFromSlave);
$productSql = "SELECT distinct modelYear FROM $searchTemp
$productWhere";
$years = $my->returnArray($productSql, $selectFromSlave);
$productSql = "SELECT distinct bootWidth FROM $searchTemp
$productWhere";
$bootWidth = $my->returnArray($productSql, $selectFromSlave);
$productSql = "SELECT distinct flexRating FROM $searchTemp
$productWhere";
$flex = $my->returnArray($productSql, $selectFromSlave);
$end = microtime(true);
echo "Did second in " . ($end - $start) . " seconds <br>";
So, on my development server, #1 runs in .9 seconds, and #2 runs in
3.7 seconds. However in my live production environment with 2
webservers and 2 database servers, they run at approx 1.1 seconds
each. It's essentially a tie.

Another thing to keep in mind is whichever option I choose, I'll be
using memcache to speed things along also.

So, in short, both run at the same speed, but which one is more
scalable?

Thanks.
Aug 1 '08 #1
11 1573
"rich" <rb*****@gmail.comwrote in message
news:df**********************************@b1g2000h sg.googlegroups.com...
I'm having a tough time figuring out which of these two options are
best. This is a matter of processing my data in PHP, vs MySQL.
Usually that's a no brainer, but I have a couple gotchyas here and
would love any and all opinions here. I'm going to make this as short
and simple as I can...
<snip>
So, in short, both run at the same speed, but which one is more
scalable?
you've got to be kidding, right? if not, you're overlooking a lot of obvious
things. as much as possible, let a db do what it was designed to do. you
know very well your php scenario won't fly and has no chance of scaling!
right?
Aug 1 '08 #2
On Aug 1, 10:58*am, "Dale" <the....@example.comwrote:
you've got to be kidding, right? if not, you're overlooking a lot of obvious
things. as much as possible, let a db do what it was designed to do. you
know very well your php scenario won't fly and has no chance of scaling!
right?
Well yeah, I know to let mysql do the work. But I think this requires
more thought than a textbook answer. This is probably (from a
performance standpoint) the most important spot on the whole site and
I want to make sure this is done right. You could argue that from a
query queue point of view, running the 1 query is way faster. Really
I guess the ONLY question here, who gets to figure out distinct
values.

Also I think memcache would be more useful in the PHP scenario. Can
just store the main query and sort it out from there.

Don't get me wrong though... I'm leaning toward the MySqQL way, I just
think this is important enough to get more opinions on before I charge
ahead.

Aug 1 '08 #3
On Aug 1, 11:31*am, rich <rbro...@gmail.comwrote:
Don't get me wrong though... I'm leaning toward the MySqQL way, I just
think this is important enough to get more opinions on before I charge
ahead.
Couple more things if anyone else wants to chime in here...

I ran the test again just now, and I got:

Did first in 1.1379570960999 seconds
Did second in 5.2290420532227 seconds

When I said before the times were tied, I think I was being dumb and
the queries were cached. I'm pretty sure there's no way I can count
on these being cached live because of all the possible combinations.
So again.. option 1 uses far less mysql time.. AND if i stored that
main query in memcache, it'd be even faster. BUT - that sounds like a
lot of webserver memory usage. Not great. Ugh.

Aug 1 '08 #4

"rich" <rb*****@gmail.comwrote in message
news:bd**********************************@56g2000h sm.googlegroups.com...
On Aug 1, 11:31 am, rich <rbro...@gmail.comwrote:
Don't get me wrong though... I'm leaning toward the MySqQL way, I just
think this is important enough to get more opinions on before I charge
ahead.
Couple more things if anyone else wants to chime in here...

I ran the test again just now, and I got:

Did first in 1.1379570960999 seconds
Did second in 5.2290420532227 seconds

When I said before the times were tied, I think I was being dumb and
the queries were cached. I'm pretty sure there's no way I can count
on these being cached live because of all the possible combinations.

== think again! and, those are pretty simple queries in your example. what
does your criteria look like?
Aug 1 '08 #5
On Aug 1, 12:00*pm, "Dale" <the....@example.comwrote:
== think again! and, those are pretty simple queries in your example.what
does your criteria look like?
Since the table is just denormalized data, the queries are very
simple. Right now there's only one where clause to just pull product
for the current category. From there, the worst it'll get is give me
anything in red and size 11, etc. No multiple tables or joins here.
But this is why I say I don't think I can rely on the query cache.
For all the attributes we allow the user to filter for, there's
probably millions of combinations across all possible outcomes.
Aug 1 '08 #6
rich wrote:
I'm having a tough time figuring out which of these two options are
best. This is a matter of processing my data in PHP, vs MySQL.
Usually that's a no brainer, but I have a couple gotchyas here and
would love any and all opinions here. I'm going to make this as short
and simple as I can...

This is for an e-commerce site with very high traffic, and the choice
will probably not be based on speed, but which is more scalable. I
need this to last. So here's my test code.. you may not know all
these functions, but I think they're very straight forward:

// 2 ways of doing this.. 1 query or more?
$start = microtime(true);
$productSql = "SELECT * FROM $searchTemp $productWhere $sort"; //
searchTemp is a large table of denormalized data
$searchResults = $my->returnTableAssoc($productSql,
$selectFromSlave); // this just returns a multidimensional array of
the results

// this is an array_unique for a multidimensional array and will
essentially be like group_by productid
$products = remove_dups($searchResults, 'productid');
// get the other columns of data needed
$brands = array();
$cats = array();
$colors = array();
$years = array();
$bootWidth = array();
$flex = array();
foreach($searchResults as $sr)
{
$brands[] = $sr['manufacturer'];
$cats[] = $sr['categoryid'];
$colors[] = $sr['colorcode'];
$years[] = $sr['modelYear'];
$bootWidth[] = $sr['bootWidth'];
$flex[] = $sr['flexRating'];
}
$brands = array_unique($brands);
$cats = array_unique($cats);
$colors = array_unique($colors);
$years = array_unique($years);
$bootWidth = array_unique($bootWidth);
$flex = array_unique($flex);
$end = microtime(true);
echo "Did first in " . ($end - $start) . " seconds <br>";

// try again - just do a bunch of queries and let mysql do all the
work
$productSql = "SELECT * FROM $searchTemp $productWhere GROUP BY
productid $sort";
$products = $my->returnTableAssoc($productSql, $selectFromSlave);
$productSql = "SELECT distinct manufacturer FROM $searchTemp
$productWhere";
$brands = $my->returnArray($productSql, $selectFromSlave);
$productSql = "SELECT distinct categoryid FROM $searchTemp
$productWhere";
$cats = $my->returnArray($productSql, $selectFromSlave);
$productSql = "SELECT distinct colorcode FROM $searchTemp
$productWhere";
$colors = $my->returnArray($productSql, $selectFromSlave);
$productSql = "SELECT distinct modelYear FROM $searchTemp
$productWhere";
$years = $my->returnArray($productSql, $selectFromSlave);
$productSql = "SELECT distinct bootWidth FROM $searchTemp
$productWhere";
$bootWidth = $my->returnArray($productSql, $selectFromSlave);
$productSql = "SELECT distinct flexRating FROM $searchTemp
$productWhere";
$flex = $my->returnArray($productSql, $selectFromSlave);
$end = microtime(true);
echo "Did second in " . ($end - $start) . " seconds <br>";
So, on my development server, #1 runs in .9 seconds, and #2 runs in
3.7 seconds. However in my live production environment with 2
webservers and 2 database servers, they run at approx 1.1 seconds
each. It's essentially a tie.

Another thing to keep in mind is whichever option I choose, I'll be
using memcache to speed things along also.

So, in short, both run at the same speed, but which one is more
scalable?

Thanks.
Rich,

Let the database do its job.

In general, you will get the best performance with a single SQL call
returning all of the data.

But you indicate the database is denormalized. Although denormalizing a
database can at times improve speed, it cuts the scalability of the
application, and as your database grows, it can actually slow down
performance because duplicate data being returned uses up the caches
much more quickly. So the absolute last thing you should do to improve
performance is to denormalize your database.

But this is getting too much off topic here. You can get more info on
this in comp.databases.mysql, as well as help in tuning your mysql system.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================
Aug 1 '08 #7

"rich" <rb*****@gmail.comwrote in message
news:e3**********************************@m36g2000 hse.googlegroups.com...
On Aug 1, 12:00 pm, "Dale" <the....@example.comwrote:
== think again! and, those are pretty simple queries in your example. what
does your criteria look like?
Since the table is just denormalized data, the queries are very
simple. Right now there's only one where clause to just pull product
for the current category. From there, the worst it'll get is give me
anything in red and size 11, etc. No multiple tables or joins here.
But this is why I say I don't think I can rely on the query cache.
For all the attributes we allow the user to filter for, there's
probably millions of combinations across all possible outcomes.

== perhaps not for the millions, but i'm sure there will be several that are
recurring. even still, the question is scaling. as i said before, php cannot
cache and cannot index. php also has zero execution plan. further, you can
adjust a db's execution plan based on those recurring patterns of
user-defined criteria! if you're still talking about scaling and you're
still thinking php, you've just opted-out of three very key performance
enhancers.

also, do you need to select * from? why not optimize your query?

// first, you've specified only the rows you really need
// second, you've asked the db to make them as distinct
// as it can get it...

SELECT DISTINCT
manufacturer brand ,
categoryid category ,
colorCode color ,
modelYear modelYear ,
bootWidth width ,
flexRating rating
FROM products
<< search criteria >>

// at this point, there is far less data
// that php has to churn through
$manufacturers = array();
$categories = array();
$colors = array();
$modelYears = array();
$bootWidths = array();
$flexRatings = array();
$records = db::execute($sql);
foreach ($records as $record)
{
$manufacturers[$record['BRAND']] = $record['BRAND'];
$categories[$record['CATEGORY']] = $record['CATEGORY'];
$colors[$record['COLOR']] = $record['COLOR'];
$modelYears[$record['MODELYEAR']] = $record['MODELYEAR'];
$bootWidths[$record['WIDTH']] = $record['WIDTH'];
$flexRatings[$record['RATING']] = $record['RATING'];
}

now, what are you test results with these changes?
Aug 1 '08 #8

"Jerry Stuckle" <js*******@attglobal.netwrote in message
news:0Y******************************@comcast.com. ..
rich wrote:
<snip>
But you indicate the database is denormalized. Although denormalizing a
database can at times improve speed, it cuts the scalability of the
application, and as your database grows, it can actually slow down
performance because duplicate data being returned uses up the caches much
more quickly. So the absolute last thing you should do to improve
performance is to denormalize your database.

But this is getting too much off topic here. You can get more info on
this in comp.databases.mysql, as well as help in tuning your mysql system.
in case you missed it, jerry-berry, his POV is that he's got a PHP system
and not a mysql one, as you've put it. his post directly deals with PHP, and
it just so happens that mysql and apache often come up in the course of
discussion. get used to it. better yet, IGNORE OT CONTENT. god knows you go
OT at every possible opportunity!
Aug 1 '08 #9

"Jerry Stuckle" <js*******@attglobal.netwrote in message
news:0Y******************************@comcast.com. ..
rich wrote:
<snip>
Let the database do its job.

In general, you will get the best performance with a single SQL call
returning all of the data.

But you indicate the database is denormalized. Although denormalizing a
database can at times improve speed, it cuts the scalability of the
application, and as your database grows, it can actually slow down
performance because duplicate data being returned uses up the caches much
more quickly. So the absolute last thing you should do to improve
performance is to denormalize your database.
i will say however, jerry, that i couldn't agree more here. rich should
notice that having 'manufacturers', 'categories', 'colors', and 'flex
ratings' in their own tables would reduce the number of rows to be scanned
by a ton. for the other columns they may be non-standard lookups, it means
that a SELECT DISTINCT over the product table just for those columns would
greatly increase the number of 'duplicates' returned to php. his individual
selects for mfg's, cat's, etc. should be lightning fast at that point too.

it doesn't happen often, but you actually gave good advice here. i'm
shocked! :^)
Aug 1 '08 #10

"Dale" <th*****@example.comwrote in message
news:2u*****************@newsfe09.iad...
>
"Jerry Stuckle" <js*******@attglobal.netwrote in message
news:0Y******************************@comcast.com. ..
>rich wrote:

<snip>
>Let the database do its job.

In general, you will get the best performance with a single SQL call
returning all of the data.

But you indicate the database is denormalized. Although denormalizing a
database can at times improve speed, it cuts the scalability of the
application, and as your database grows, it can actually slow down
performance because duplicate data being returned uses up the caches much
more quickly. So the absolute last thing you should do to improve
performance is to denormalize your database.

i will say however, jerry, that i couldn't agree more here. rich should
notice that having 'manufacturers', 'categories', 'colors', and 'flex
ratings' in their own tables would reduce the number of rows to be scanned
by a ton. for the other columns they may be non-standard lookups, it means
that a SELECT DISTINCT over the product table just for those columns would
greatly increase the number of 'duplicates' returned to php
errrr...should read, 'greatly *decrease*'.

:)
Aug 1 '08 #11
Creating indexes for frequently-used combinations of search and result
columns will speed up case #2 and would be the most scalable approach.
Otherwise, a full table scan would be needed for each query. You can
use "explain" to check if your indexes are being used. An optional
refinement would be to create a stored proc that returns multiple sets
to avoid additional round trips between the DB and application.

In both cases you're storing all the matching products in an array, so
it looks like you expect this list to be a manageable size anyway and
presumably you'll be iterating over it. If so, and provided there
aren't too many duplicate entries in the DB or you can rework your DB
to eliminate these, approach #1 would be fine. You don't need
array_unique if you store the value in the key, e.g.:
"$brands[$sr['manufacturer']] = true;"

A third way, more common in batch apps, is to create a temp table with
the base results, then query this subset for each distinct field.

Regards,

John Peters

On Aug 1, 10:32 am, rich <rbro...@gmail.comwrote:
I'm having a tough time figuring out which of these two options are
best. This is a matter of processing my data in PHP, vs MySQL.
Usually that's a no brainer, but I have a couple gotchyas here and
would love any and all opinions here. I'm going to make this as short
and simple as I can...

This is for an e-commerce site with very high traffic, and the choice
will probably not be based on speed, but which is more scalable. I
need this to last. So here's my test code.. you may not know all
these functions, but I think they're very straight forward:

// 2 ways of doing this.. 1 query or more?
$start = microtime(true);
$productSql = "SELECT * FROM $searchTemp $productWhere $sort"; //
searchTemp is a large table of denormalized data
$searchResults = $my->returnTableAssoc($productSql,
$selectFromSlave); // this just returns a multidimensional array of
the results

// this is an array_unique for a multidimensional array and will
essentially be like group_by productid
$products = remove_dups($searchResults, 'productid');
// get the other columns of data needed
$brands = array();
$cats = array();
$colors = array();
$years = array();
$bootWidth = array();
$flex = array();
foreach($searchResults as $sr)
{
$brands[] = $sr['manufacturer'];
$cats[] = $sr['categoryid'];
$colors[] = $sr['colorcode'];
$years[] = $sr['modelYear'];
$bootWidth[] = $sr['bootWidth'];
$flex[] = $sr['flexRating'];}

$brands = array_unique($brands);
$cats = array_unique($cats);
$colors = array_unique($colors);
$years = array_unique($years);
$bootWidth = array_unique($bootWidth);
$flex = array_unique($flex);
$end = microtime(true);
echo "Did first in " . ($end - $start) . " seconds <br>";

// try again - just do a bunch of queries and let mysql do all the
work
$productSql = "SELECT * FROM $searchTemp $productWhere GROUP BY
productid $sort";
$products = $my->returnTableAssoc($productSql, $selectFromSlave);
$productSql = "SELECT distinct manufacturer FROM $searchTemp
$productWhere";
$brands = $my->returnArray($productSql, $selectFromSlave);
$productSql = "SELECT distinct categoryid FROM $searchTemp
$productWhere";
$cats = $my->returnArray($productSql, $selectFromSlave);
$productSql = "SELECT distinct colorcode FROM $searchTemp
$productWhere";
$colors = $my->returnArray($productSql, $selectFromSlave);
$productSql = "SELECT distinct modelYear FROM $searchTemp
$productWhere";
$years = $my->returnArray($productSql, $selectFromSlave);
$productSql = "SELECT distinct bootWidth FROM $searchTemp
$productWhere";
$bootWidth = $my->returnArray($productSql, $selectFromSlave);
$productSql = "SELECT distinct flexRating FROM $searchTemp
$productWhere";
$flex = $my->returnArray($productSql, $selectFromSlave);
$end = microtime(true);
echo "Did second in " . ($end - $start) . " seconds <br>";

So, on my development server, #1 runs in .9 seconds, and #2 runs in
3.7 seconds. However in my live production environment with 2
webservers and 2 database servers, they run at approx 1.1 seconds
each. It's essentially a tie.

Another thing to keep in mind is whichever option I choose, I'll be
using memcache to speed things along also.

So, in short, both run at the same speed, but which one is more
scalable?

Thanks.
Aug 1 '08 #12

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: Wenning Qiu | last post by:
I am researching issues related to emdedding Python in C++ for a project. My project will be running on an SMP box and requires scalability. However, my test shows that Python threading has very...
3
by: Arpan | last post by:
What does the term "scalability of an application" mean? Thanks, Arpan
0
by: Khaled D Elmeleegy | last post by:
--=_alternative 004FC1E080256D75_= Content-Type: text/plain; charset="us-ascii" I am studying the scalability of MYSQL on SMPs on Linux. I am wondering if any one has performed scalability...
0
by: tharma | last post by:
I was wondering if some one provides some information about scalability and performance of ASP vs JSP. Scalability of JSP vs. ASP (which one is better?) Performance of JSP vs. ASP (which has...
35
by: deko | last post by:
Do I get more scalability if I split my database? The way I calculate things now, I'll be lucky to get 100,000 records in my Access 2003 mdb. Here some math: Max mdb/mde size = 2000 x 1024 =...
6
by: John Wells | last post by:
Guys, My boss has been keeping himself busy reading MySQL marketing pubs, and came at me with a few questions this morning regarding PostgreSQL features (we're currently moving to PostgreSQL). ...
4
hsriat
by: hsriat | last post by:
I have a PHP, mySQL site. It works good when number of users is less than 50. To the 51th user, the mySQL server doesn't give connection. Apparently, the hosting company provides only 50 concurrent...
9
by: Tim Mitchell | last post by:
Hi All, I work on a desktop application that has been developed using python and GTK (see www.leapfrog3d.com). We have around 150k lines of python code (and 200k+ lines of C). We also have a...
0
by: Daniel Fetchinson | last post by:
>I work on a desktop application that has been developed using python If python is suitable for large projects with over 300k lines of code but not in the desktop app scene but web apps then...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.