By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
431,900 Members | 1,078 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 431,900 IT Pros & Developers. It's quick & easy.

MySQL Speed

P: n/a
Can someone tell me which of these 2 SQL queries will be more efficient? I'm
having a debate with another guy about which would be less resource
intensive for MySQL.

The first uses MySQL to pick a random row in a single statement:
<?php
$sql = "SELECT FROM myTable ORDER BY RAND() LIMIT 1";
$query = mysql_query($sql,$conn);
?>
The second does the same thing, but uses 2 queries to do it:
<?php
$sqlA = "SELECT COUNT(id) FROM myTable";
$queryA = mysql_query($sqlA,$conn);
$num = mysql_result($queryA,0,0);
$random = rand(1,$num);
$sqlB = "SELECT FROM myTable WHERE id='" . $random . "' LIMIT 1";
?>

Basically, which approach would be better to use in a high traffic
environment to retrieve a single random row?
Thanks for any help/advice you can give!

ps. the code may not be 100% correct, it is for demonstration purposes only!
Apr 28 '06 #1
Share this Question
Share on Google+
6 Replies


P: n/a

Ridge Burner wrote:
Can someone tell me which of these 2 SQL queries will be more efficient? I'm
having a debate with another guy about which would be less resource
intensive for MySQL.

The first uses MySQL to pick a random row in a single statement:
<?php
$sql = "SELECT FROM myTable ORDER BY RAND() LIMIT 1";
$query = mysql_query($sql,$conn);
?>
The second does the same thing, but uses 2 queries to do it:
<?php
$sqlA = "SELECT COUNT(id) FROM myTable";
$queryA = mysql_query($sqlA,$conn);
$num = mysql_result($queryA,0,0);
$random = rand(1,$num);
$sqlB = "SELECT FROM myTable WHERE id='" . $random . "' LIMIT 1";
?>

Basically, which approach would be better to use in a high traffic
environment to retrieve a single random row?


The problem here is that the second query is wrong for what you're
trying to do. The number of rows in a table doesn't necessarily have
anything kind of relationship with the primary keys. Rows could have
been deleted or the seed value of the auto-increment column might not
have been 1.

Apr 28 '06 #2

P: n/a

"Chung Leong" <ch***********@hotmail.com> wrote in message
news:11**********************@e56g2000cwe.googlegr oups.com...

Ridge Burner wrote:
Can someone tell me which of these 2 SQL queries will be more efficient?
I'm
having a debate with another guy about which would be less resource
intensive for MySQL.

The first uses MySQL to pick a random row in a single statement:
<?php
$sql = "SELECT FROM myTable ORDER BY RAND() LIMIT 1";
$query = mysql_query($sql,$conn);
?>
The second does the same thing, but uses 2 queries to do it:
<?php
$sqlA = "SELECT COUNT(id) FROM myTable";
$queryA = mysql_query($sqlA,$conn);
$num = mysql_result($queryA,0,0);
$random = rand(1,$num);
$sqlB = "SELECT FROM myTable WHERE id='" . $random . "' LIMIT 1";
?>

Basically, which approach would be better to use in a high traffic
environment to retrieve a single random row?


The problem here is that the second query is wrong for what you're
trying to do. The number of rows in a table doesn't necessarily have
anything kind of relationship with the primary keys. Rows could have
been deleted or the seed value of the auto-increment column might not
have been 1.


What if I change it to say something like 'SELECT COUNT(column1) FROM
myTable WHERE live='0' LIMIT 1'

I understand the primary keys may not always return a reliable result, but
I'm questioning whether it is better to use one query to return a row vs.
using 2 queries to return a random row.

Right now we are experiencing issues with MySQL being able to keep up with
all of the traffic. The first SQL statement is used for ad banners, and our
hoster has mentioned that this is not the most efficient way to do what were
doing. The only other way I can see to retrieve a random row is to query for
the # of rows that match the criteria, then use that # to have PHP generate
a random # between 1 & that #, then re-query MySQL for the row that PHP
generated. After that I'll need to make sure that the row does actually
exist in the table. If it doesn't, I have to start again.

It's just to me, the second approach seems to have a lot more overhead in it
than the first.







Apr 28 '06 #3

P: n/a
Rik
Ridge Burner wrote:
Can someone tell me which of these 2 SQL queries will be more
efficient? I'm having a debate with another guy about which would be
less resource intensive for MySQL.

Depends:
<?php
$db = mysql_connect();
$conn = mysql_select_db("testbase",$db);

$start1 = microtime(true);
for($i=1;$i<2000;$i++){
$sql = "SELECT * FROM wnk_huizen ORDER BY RAND() LIMIT 1";
$query = mysql_query($sql);
$result = mysql_fetch_array($query);
}
$end1 = microtime(true);

$start2 = microtime(true);

for($i=1;$i<2000;$i++){
$value = array();
$sql = "SELECT id FROM wnk_huizen";
$query = mysql_query($sql);
while($result = mysql_fetch_array($query)){
$value[] = $result;
}
$random = rand(0,count($value)-1);
$sql = "SELECT * FROM wnk_huizen WHERE id='" . $value[$random] . "' LIMIT
1";
$query = mysql_query($sql);
$result = mysql_fetch_array($query);
}
$end2 = microtime(true);
echo "<br />result with ".count($value)." records;<br />";
$time1= $end1-$start1;
$time2= $end2-$start2;
echo "<br />Option 1:".$start1."-".$end1." :".$time1;
echo "<br />Option 2:".$start2."-".$end2." :".$time2;
?>

Result from my extremely slow testserver:

result with 69 records;

Option 1:1146258591.25-1146258626.6845 :35.434526205063
Option 2:1146258626.6845-1146258636.4802 :9.7956740856171

result with 2208 records;

Option 1:1146259157.8281-1146259527.2195 :369.39138197899
Option 2:1146259527.2195-1146259655.4201 :128.20057296753

You'd think option 1 is slower, except if I use a different table:

result with 613 records (id=int(4), primary key);

Option 1:1146258636.4804-1146258644.8055 :8.3251550197601
Option 2:1146258644.8056-1146258683.6749 :38.869340896606

I presume it's because the first table had "text" fields, the second
integers and a 2 VARCHAR(50).
for your reference:
TABLE 1:
Field Type Null Key Default Extra
id int(5) PRI NULL auto_increment
stad varchar(50)
postcode varchar(7)
adres varchar(70)
page varchar(40)
prijs varchar(20)
short_desc text
long_desc text
status char(1)
time timestamp YES CURRENT_TIMESTAMP

TABLE 2
Field Type Null Key Default Extra
id int(5) PRI NULL auto_increment
img_name varchar(50)
img_huis_id int(5) 0
img_huis_default tinyint(1) 0
img_huis_desc varchar(50)
So, it's highly dependable on the database, I'm not going to waste time
checking the details myself, but some people on the mysql newsgroup might
now how ORDER BY RAND() is affected by type of fields in a table.

Grtz,
--
Rik Wasmus
Apr 28 '06 #4

P: n/a
Rik
Ridge Burner wrote:
Right now we are experiencing issues with MySQL being able to keep up
The first SQL statement is used for ad
banners, and our hoster has mentioned that this is not the most
efficient way to do what were doing.


See my other post. thought it was just an hypothetical(?) situation. For
this specific problem you can simply test yourself: create the table on your
local server, and test with for($i=1;$i<$number;$i++) where $number is an
arbitrary high number.

Note: this is the time the script takes, not neccesarily how long MySQL
takes. If it's just a case about MySQL which is to busy, and PHP has no
problem keeping up, it's a simple choice to choose to do more processing in
PHP.

Then again, I'm no server admin. Maybe someone else with more experience can
shed more light on the subject of processing by MySQL and PHP on production
servers.

(?)pfff, english, I hope it's the correct word and spelling)

Grtz,
--
Rik Wasmus
Apr 28 '06 #5

P: n/a

Ridge Burner wrote:
Can someone tell me which of these 2 SQL queries will be more efficient? I'm
having a debate with another guy about which would be less resource
intensive for MySQL.

The first uses MySQL to pick a random row in a single statement:
<?php
$sql = "SELECT FROM myTable ORDER BY RAND() LIMIT 1";
$query = mysql_query($sql,$conn);
?>
The second does the same thing, but uses 2 queries to do it:
<?php
$sqlA = "SELECT COUNT(id) FROM myTable";
$queryA = mysql_query($sqlA,$conn);
$num = mysql_result($queryA,0,0);
$random = rand(1,$num);
$sqlB = "SELECT FROM myTable WHERE id='" . $random . "' LIMIT 1";
?>

Basically, which approach would be better to use in a high traffic
environment to retrieve a single random row?
Thanks for any help/advice you can give!

ps. the code may not be 100% correct, it is for demonstration purposes only!


MySQL creates an internal counter for the number of rows in a table (at
least MyISAM as far as I know). So doing "SELECT COUNT(*) FROM table"
is instant. So as far as I know, using the COUNT(*) should be faster,
as long as id is a unique (or primary) key. If in doubt, use the
EXPLAIN syntax to see how many rows are being queried for the two
different queries. Hope that helps.

Apr 28 '06 #6

P: n/a
>>> The first uses MySQL to pick a random row in a single statement:
<?php
$sql = "SELECT FROM myTable ORDER BY RAND() LIMIT 1";
$query = mysql_query($sql,$conn);
?>
The second does the same thing, but uses 2 queries to do it:
<?php
$sqlA = "SELECT COUNT(id) FROM myTable";
$queryA = mysql_query($sqlA,$conn);
$num = mysql_result($queryA,0,0);
$random = rand(1,$num);
$sqlB = "SELECT FROM myTable WHERE id='" . $random . "' LIMIT 1";
?>

Basically, which approach would be better to use in a high traffic
environment to retrieve a single random row?
The problem here is that the second query is wrong for what you're
trying to do. The number of rows in a table doesn't necessarily have
anything kind of relationship with the primary keys. Rows could have
been deleted or the seed value of the auto-increment column might not
have been 1.


What if I change it to say something like 'SELECT COUNT(column1) FROM
myTable WHERE live='0' LIMIT 1'


Consider seriously what happens when the result of this will *NEVER*
match a value for id. For example, suppose id is always a credit
card number (12-16 digits and unlikely to contain leading zeroes, as
Amex, Mastercard, VISA, and Discover begin with 3, 4, 5, and 6, not
necessarily in that order).
I understand the primary keys may not always return a reliable result, but
I'm questioning whether it is better to use one query to return a row vs.
using 2 queries to return a random row.
Any code can be made infinitely fast and run in zero space if it doesn't
have to return a correct answer. And I consider no banner ad to always
be preferable to a banner ad.
Right now we are experiencing issues with MySQL being able to keep up with
all of the traffic. The first SQL statement is used for ad banners, and our
hoster has mentioned that this is not the most efficient way to do what were
doing. The only other way I can see to retrieve a random row is to query for
the # of rows that match the criteria, then use that # to have PHP generate
a random # between 1 & that #, then re-query MySQL for the row that PHP
generated. After that I'll need to make sure that the row does actually
exist in the table. If it doesn't, I have to start again.
If you have 10 rows in your table numbered 77, 83, 84, 85, 89, 92, 93, 94,
95, and 97, you will *NEVER* find a row that exists and it will take
an INFINITE number of queries to generate the page.
It's just to me, the second approach seems to have a lot more overhead in it
than the first.


Gordon L. Burditt
Apr 28 '06 #7

This discussion thread is closed

Replies have been disabled for this discussion.