Scalability Code question - PHP vs MySQL

rich

I'm having a tough time figuring out which of these two options are
best. This is a matter of processing my data in PHP, vs MySQL.
Usually that's a no brainer, but I have a couple gotchyas here and
would love any and all opinions here. I'm going to make this as short
and simple as I can...

This is for an e-commerce site with very high traffic, and the choice
will probably not be based on speed, but which is more scalable. I
need this to last. So here's my test code.. you may not know all
these functions, but I think they're very straight forward:

// 2 ways of doing this.. 1 query or more?
$start = microtime(true) ;
$productSql = "SELECT * FROM $searchTemp $productWhere $sort"; //
searchTemp is a large table of denormalized data
$searchResults = $my->returnTableAss oc($productSql,
$selectFromSlav e); // this just returns a multidimensiona l array of
the results

// this is an array_unique for a multidimensiona l array and will
essentially be like group_by productid
$products = remove_dups($se archResults, 'productid');
// get the other columns of data needed
$brands = array();
$cats = array();
$colors = array();
$years = array();
$bootWidth = array();
$flex = array();
foreach($search Results as $sr)
{
$brands[] = $sr['manufacturer'];
$cats[] = $sr['categoryid'];
$colors[] = $sr['colorcode'];
$years[] = $sr['modelYear'];
$bootWidth[] = $sr['bootWidth'];
$flex[] = $sr['flexRating'];
}
$brands = array_unique($b rands);
$cats = array_unique($c ats);
$colors = array_unique($c olors);
$years = array_unique($y ears);
$bootWidth = array_unique($b ootWidth);
$flex = array_unique($f lex);
$end = microtime(true) ;
echo "Did first in " . ($end - $start) . " seconds <br>";

// try again - just do a bunch of queries and let mysql do all the
work
$productSql = "SELECT * FROM $searchTemp $productWhere GROUP BY
productid $sort";
$products = $my->returnTableAss oc($productSql, $selectFromSlav e);
$productSql = "SELECT distinct manufacturer FROM $searchTemp
$productWhere";
$brands = $my->returnArray($p roductSql, $selectFromSlav e);
$productSql = "SELECT distinct categoryid FROM $searchTemp
$productWhere";
$cats = $my->returnArray($p roductSql, $selectFromSlav e);
$productSql = "SELECT distinct colorcode FROM $searchTemp
$productWhere";
$colors = $my->returnArray($p roductSql, $selectFromSlav e);
$productSql = "SELECT distinct modelYear FROM $searchTemp
$productWhere";
$years = $my->returnArray($p roductSql, $selectFromSlav e);
$productSql = "SELECT distinct bootWidth FROM $searchTemp
$productWhere";
$bootWidth = $my->returnArray($p roductSql, $selectFromSlav e);
$productSql = "SELECT distinct flexRating FROM $searchTemp
$productWhere";
$flex = $my->returnArray($p roductSql, $selectFromSlav e);
$end = microtime(true) ;
echo "Did second in " . ($end - $start) . " seconds <br>";
So, on my development server, #1 runs in .9 seconds, and #2 runs in
3.7 seconds. However in my live production environment with 2
webservers and 2 database servers, they run at approx 1.1 seconds
each. It's essentially a tie.

Another thing to keep in mind is whichever option I choose, I'll be
using memcache to speed things along also.

So, in short, both run at the same speed, but which one is more
scalable?

Thanks.

Aug 1 '08 #1

Subscribe Reply

1596

Dale

"rich" <rb*****@gmail. comwrote in message
news:df******** *************** ***********@b1g 2000hsg.googleg roups.com...

I'm having a tough time figuring out which of these two options are
best. This is a matter of processing my data in PHP, vs MySQL.
Usually that's a no brainer, but I have a couple gotchyas here and
would love any and all opinions here. I'm going to make this as short
and simple as I can...

<snip>

So, in short, both run at the same speed, but which one is more
scalable?

you've got to be kidding, right? if not, you're overlooking a lot of obvious
things. as much as possible, let a db do what it was designed to do. you
know very well your php scenario won't fly and has no chance of scaling!
right?

Aug 1 '08 #2

rich

On Aug 1, 10:58*am, "Dale" <the....@exampl e.comwrote:

you've got to be kidding, right? if not, you're overlooking a lot of obvious
things. as much as possible, let a db do what it was designed to do. you
know very well your php scenario won't fly and has no chance of scaling!
right?

Well yeah, I know to let mysql do the work. But I think this requires
more thought than a textbook answer. This is probably (from a
performance standpoint) the most important spot on the whole site and
I want to make sure this is done right. You could argue that from a
query queue point of view, running the 1 query is way faster. Really
I guess the ONLY question here, who gets to figure out distinct
values.

Also I think memcache would be more useful in the PHP scenario. Can
just store the main query and sort it out from there.

Don't get me wrong though... I'm leaning toward the MySqQL way, I just
think this is important enough to get more opinions on before I charge
ahead.

Aug 1 '08 #3

rich

On Aug 1, 11:31*am, rich <rbro...@gmail. comwrote:

Don't get me wrong though... I'm leaning toward the MySqQL way, I just
think this is important enough to get more opinions on before I charge
ahead.

Couple more things if anyone else wants to chime in here...

I ran the test again just now, and I got:

Did first in 1.1379570960999 seconds
Did second in 5.2290420532227 seconds

When I said before the times were tied, I think I was being dumb and
the queries were cached. I'm pretty sure there's no way I can count
on these being cached live because of all the possible combinations.
So again.. option 1 uses far less mysql time.. AND if i stored that
main query in memcache, it'd be even faster. BUT - that sounds like a
lot of webserver memory usage. Not great. Ugh.

Aug 1 '08 #4

Dale

"rich" <rb*****@gmail. comwrote in message
news:bd******** *************** ***********@56g 2000hsm.googleg roups.com...
On Aug 1, 11:31 am, rich <rbro...@gmail. comwrote:

Don't get me wrong though... I'm leaning toward the MySqQL way, I just
think this is important enough to get more opinions on before I charge
ahead.

Couple more things if anyone else wants to chime in here...

I ran the test again just now, and I got:

Did first in 1.1379570960999 seconds
Did second in 5.2290420532227 seconds

When I said before the times were tied, I think I was being dumb and
the queries were cached. I'm pretty sure there's no way I can count
on these being cached live because of all the possible combinations.

== think again! and, those are pretty simple queries in your example. what
does your criteria look like?

Aug 1 '08 #5

rich

On Aug 1, 12:00*pm, "Dale" <the....@exampl e.comwrote:

== think again! and, those are pretty simple queries in your example.what
does your criteria look like?

Aug 1 '08 #6

Jerry Stuckle

rich wrote:

I'm having a tough time figuring out which of these two options are
best. This is a matter of processing my data in PHP, vs MySQL.
Usually that's a no brainer, but I have a couple gotchyas here and
would love any and all opinions here. I'm going to make this as short
and simple as I can...

This is for an e-commerce site with very high traffic, and the choice
will probably not be based on speed, but which is more scalable. I
need this to last. So here's my test code.. you may not know all
these functions, but I think they're very straight forward:

// 2 ways of doing this.. 1 query or more?
$start = microtime(true) ;
$productSql = "SELECT * FROM $searchTemp $productWhere $sort"; //
searchTemp is a large table of denormalized data
$searchResults = $my->returnTableAss oc($productSql,
$selectFromSlav e); // this just returns a multidimensiona l array of
the results

// this is an array_unique for a multidimensiona l array and will
essentially be like group_by productid
$products = remove_dups($se archResults, 'productid');
// get the other columns of data needed
$brands = array();
$cats = array();
$colors = array();
$years = array();
$bootWidth = array();
$flex = array();
foreach($search Results as $sr)
{
$brands[] = $sr['manufacturer'];
$cats[] = $sr['categoryid'];
$colors[] = $sr['colorcode'];
$years[] = $sr['modelYear'];
$bootWidth[] = $sr['bootWidth'];
$flex[] = $sr['flexRating'];
}
$brands = array_unique($b rands);
$cats = array_unique($c ats);
$colors = array_unique($c olors);
$years = array_unique($y ears);
$bootWidth = array_unique($b ootWidth);
$flex = array_unique($f lex);
$end = microtime(true) ;
echo "Did first in " . ($end - $start) . " seconds <br>";

// try again - just do a bunch of queries and let mysql do all the
work
$productSql = "SELECT * FROM $searchTemp $productWhere GROUP BY
productid $sort";
$products = $my->returnTableAss oc($productSql, $selectFromSlav e);
$productSql = "SELECT distinct manufacturer FROM $searchTemp
$productWhere";
$brands = $my->returnArray($p roductSql, $selectFromSlav e);
$productSql = "SELECT distinct categoryid FROM $searchTemp
$productWhere";
$cats = $my->returnArray($p roductSql, $selectFromSlav e);
$productSql = "SELECT distinct colorcode FROM $searchTemp
$productWhere";
$colors = $my->returnArray($p roductSql, $selectFromSlav e);
$productSql = "SELECT distinct modelYear FROM $searchTemp
$productWhere";
$years = $my->returnArray($p roductSql, $selectFromSlav e);
$productSql = "SELECT distinct bootWidth FROM $searchTemp
$productWhere";
$bootWidth = $my->returnArray($p roductSql, $selectFromSlav e);
$productSql = "SELECT distinct flexRating FROM $searchTemp
$productWhere";
$flex = $my->returnArray($p roductSql, $selectFromSlav e);
$end = microtime(true) ;
echo "Did second in " . ($end - $start) . " seconds <br>";
So, on my development server, #1 runs in .9 seconds, and #2 runs in
3.7 seconds. However in my live production environment with 2
webservers and 2 database servers, they run at approx 1.1 seconds
each. It's essentially a tie.

Another thing to keep in mind is whichever option I choose, I'll be
using memcache to speed things along also.

So, in short, both run at the same speed, but which one is more
scalable?

Thanks.

Rich,

Let the database do its job.

In general, you will get the best performance with a single SQL call
returning all of the data.

But you indicate the database is denormalized. Although denormalizing a
database can at times improve speed, it cuts the scalability of the
application, and as your database grows, it can actually slow down
performance because duplicate data being returned uses up the caches
much more quickly. So the absolute last thing you should do to improve
performance is to denormalize your database.

But this is getting too much off topic here. You can get more info on
this in comp.databases. mysql, as well as help in tuning your mysql system.

--
=============== ===
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attgl obal.net
=============== ===

Aug 1 '08 #7

Dale

"rich" <rb*****@gmail. comwrote in message
news:e3******** *************** ***********@m36 g2000hse.google groups.com...
On Aug 1, 12:00 pm, "Dale" <the....@exampl e.comwrote:

== think again! and, those are pretty simple queries in your example. what
does your criteria look like?

Since the table is just denormalized data, the queries are very
simple. Right now there's only one where clause to just pull product
for the current category. From there, the worst it'll get is give me
anything in red and size 11, etc. No multiple tables or joins here.
But this is why I say I don't think I can rely on the query cache.
For all the attributes we allow the user to filter for, there's
probably millions of combinations across all possible outcomes.

== perhaps not for the millions, but i'm sure there will be several that are
recurring. even still, the question is scaling. as i said before, php cannot
cache and cannot index. php also has zero execution plan. further, you can
adjust a db's execution plan based on those recurring patterns of
user-defined criteria! if you're still talking about scaling and you're
still thinking php, you've just opted-out of three very key performance
enhancers.

also, do you need to select * from? why not optimize your query?

// first, you've specified only the rows you really need
// second, you've asked the db to make them as distinct
// as it can get it...

SELECT DISTINCT
manufacturer brand ,
categoryid category ,
colorCode color ,
modelYear modelYear ,
bootWidth width ,
flexRating rating
FROM products
<< search criteria >>

// at this point, there is far less data
// that php has to churn through
$manufacturers = array();
$categories = array();
$colors = array();
$modelYears = array();
$bootWidths = array();
$flexRatings = array();
$records = db::execute($sq l);
foreach ($records as $record)
{
$manufacturers[$record['BRAND']] = $record['BRAND'];
$categories[$record['CATEGORY']] = $record['CATEGORY'];
$colors[$record['COLOR']] = $record['COLOR'];
$modelYears[$record['MODELYEAR']] = $record['MODELYEAR'];
$bootWidths[$record['WIDTH']] = $record['WIDTH'];
$flexRatings[$record['RATING']] = $record['RATING'];
}

now, what are you test results with these changes?

Aug 1 '08 #8

Dale

"Jerry Stuckle" <js*******@attg lobal.netwrote in message
news:0Y******** *************** *******@comcast .com...

rich wrote:

<snip>

But you indicate the database is denormalized. Although denormalizing a
database can at times improve speed, it cuts the scalability of the
application, and as your database grows, it can actually slow down
performance because duplicate data being returned uses up the caches much
more quickly. So the absolute last thing you should do to improve
performance is to denormalize your database.

But this is getting too much off topic here. You can get more info on
this in comp.databases. mysql, as well as help in tuning your mysql system.

in case you missed it, jerry-berry, his POV is that he's got a PHP system
and not a mysql one, as you've put it. his post directly deals with PHP, and
it just so happens that mysql and apache often come up in the course of
discussion. get used to it. better yet, IGNORE OT CONTENT. god knows you go
OT at every possible opportunity!

Aug 1 '08 #9

Dale

"Jerry Stuckle" <js*******@attg lobal.netwrote in message
news:0Y******** *************** *******@comcast .com...

rich wrote:

<snip>

Let the database do its job.

In general, you will get the best performance with a single SQL call
returning all of the data.

But you indicate the database is denormalized. Although denormalizing a
database can at times improve speed, it cuts the scalability of the
application, and as your database grows, it can actually slow down
performance because duplicate data being returned uses up the caches much
more quickly. So the absolute last thing you should do to improve
performance is to denormalize your database.

i will say however, jerry, that i couldn't agree more here. rich should
notice that having 'manufacturers' , 'categories', 'colors', and 'flex
ratings' in their own tables would reduce the number of rows to be scanned
by a ton. for the other columns they may be non-standard lookups, it means
that a SELECT DISTINCT over the product table just for those columns would
greatly increase the number of 'duplicates' returned to php. his individual
selects for mfg's, cat's, etc. should be lightning fast at that point too.

it doesn't happen often, but you actually gave good advice here. i'm
shocked! :^)

Aug 1 '08 #10

Similar topics

4498

Embedding Python, threading and scalability

by: Wenning Qiu | last post by:

I am researching issues related to emdedding Python in C++ for a project. My project will be running on an SMP box and requires scalability. However, my test shows that Python threading has very poor performance in terms of scaling. In fact it doesn't scale at all. I wrote a simple test program to complete given number of iterations of a simple loop. The total number of iterations can be divided evenly among a number of threads. My...

Python

1916

Scalability

by: Arpan | last post by:

What does the term "scalability of an application" mean? Thanks, Arpan

ASP / Active Server Pages

1581

MYSQL Scalability on SMPs

by: Khaled D Elmeleegy | last post by:

--=_alternative 004FC1E080256D75_= Content-Type: text/plain; charset="us-ascii" I am studying the scalability of MYSQL on SMPs on Linux. I am wondering if any one has performed scalability studies. If so, I would be interested in a pointer to the results; if not, I am curious if there is interest in MYSQL's scalability. Pointers to benchmarks used to study MYSQL would also

MySQL Database

1162

Performance and Scalability of JSP and ASP

by: tharma | last post by:

I was wondering if some one provides some information about scalability and performance of ASP vs JSP. Scalability of JSP vs. ASP (which one is better?) Performance of JSP vs. ASP (which has better performance?) I have been looking for graphs, and charts that compare JSP vs. ASP (scalability and performance) but I couldn't find any. If anyone knows any link which has scalability of performance graph of ASP vs. JSP,

ASP / Active Server Pages

3225

Access Scalability - records vs. filesize

by: deko | last post by:

Do I get more scalability if I split my database? The way I calculate things now, I'll be lucky to get 100,000 records in my Access 2003 mdb. Here some math: Max mdb/mde size = 2000 x 1024 = 2,048,000k Let's say on average each record in the database consumes 15k 2,048,000/15 = 136,533 records

Microsoft Access / VBA

2020

Two questions from the boss (SQL:2003 && scalability)

by: John Wells | last post by:

Guys, My boss has been keeping himself busy reading MySQL marketing pubs, and came at me with a few questions this morning regarding PostgreSQL features (we're currently moving to PostgreSQL). While I don't think either are really that important for our situation, he wanted to know specifically whether PostgreSQL supported SQL:2003, and what sort of capabilities PostgreSQL has to scale across multiple CPUs and hosts (multithreading,...

PostgreSQL Database

1487

How to improve scalability of PHP mySQL system?

by: hsriat | last post by:

I have a PHP, mySQL site. It works good when number of users is less than 50. To the 51th user, the mySQL server doesn't give connection. Apparently, the hosting company provides only 50 concurrent connections. How can I improve the scalability of the site so that more then 50 users can use it even if allowable number of connections is limited to 50? I know it could be done by making some intermediate agents. Any suggestions or...

PHP

1925

python scalability

by: Tim Mitchell | last post by:

Hi All, I work on a desktop application that has been developed using python and GTK (see www.leapfrog3d.com). We have around 150k lines of python code (and 200k+ lines of C). We also have a new project manager with a C# background who has deep concerns about the scalability of python as our code base continues to grow and we are looking at introducing more products. I am looking for examples of other people like us (who write...

Python

174

Re: python scalability

by: Daniel Fetchinson | last post by:

>I work on a desktop application that has been developed using python If python is suitable for large projects with over 300k lines of code but not in the desktop app scene but web apps then probably it is also suitable for desktop apps. You can probably convince your manager that the previous statement is true. Once that is done you can point to youtube and google who are heavy python users without scalability issues. Cheers,

Python

9554

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

9376

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

9923

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

8813

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...

Career Advice

7358

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

5266

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

5405

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

3509

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP

2788

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

General