472,983 Members | 2,773 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,983 software developers and data experts.

MySQL Query Optimization

I have two tables. The first is "Locations" which has 52,000 zip codes
w/cooresponding latitudes and longitudes. The second "Suppliers" has 2,000
company names and addresses.

The user enters a zip code and the cooresponding latitude and longitude is
return.

$sql = "SELECT l.latitude, l.longitude ";
$sql .= "FROM Locations l";
$sql .= "WHERE l.zipcode = ".$zip;

I use PHP to calculate the "high" and "low" latitude and longitude for 20
mile radius.

I then use these values to return a list of suppliers.

$sql = "SELECT s.suppliername, s.address1, s.address2, s.city, s.state,
s.zip, s.phone ";
$sql .= "FROM Suppliers s ";
$sql .= "INNER JOIN Locations l ON l.zipcode = s.zip ";
$sql .= "WHERE l.latitude <= ".$highLatitude." ";
$sql .= "AND l.latitude >= ".$lowLatitude." ";
$sql .= "AND l.longitude >= ".$lowLongitude." ";
$sql .= "AND l.longitude <= ".$highLongitude." ";
$sql .= "ORDER BY s.state, s.suppliername, s.city";

This works great and returns the desired results. The problem is the query
can take 15 seconds or more for broader searches. This is too long and
ideally needs to be no more than 5 seconds.

My questions are:

1) How can the above query be optimized?

The "Locations" table is not indexed. I have read that columns used in the
WHERE clause are most important for indexing.

2) What should the index sizes be based on the # of records noted above?

Any advice/suggestions would be greatly appreciated.

Thanks!
Jul 17 '05 #1
4 2346
Hi !

On Thu, 05 Feb 2004 02:57:52 GMT, "Bacci" <ba***@shimmi.com> wrote:
I have two tables. The first is "Locations" which has 52,000 zip codes
w/cooresponding latitudes and longitudes. The second "Suppliers" has 2,000
company names and addresses.

The user enters a zip code and the cooresponding latitude and longitude is
return.

$sql = "SELECT l.latitude, l.longitude ";
$sql .= "FROM Locations l";
$sql .= "WHERE l.zipcode = ".$zip;

I use PHP to calculate the "high" and "low" latitude and longitude for 20
mile radius.

I then use these values to return a list of suppliers.

$sql = "SELECT s.suppliername, s.address1, s.address2, s.city, s.state,
s.zip, s.phone ";
$sql .= "FROM Suppliers s ";
$sql .= "INNER JOIN Locations l ON l.zipcode = s.zip ";
$sql .= "WHERE l.latitude <= ".$highLatitude." ";
$sql .= "AND l.latitude >= ".$lowLatitude." ";
$sql .= "AND l.longitude >= ".$lowLongitude." ";
$sql .= "AND l.longitude <= ".$highLongitude." ";
$sql .= "ORDER BY s.state, s.suppliername, s.city";

This works great and returns the desired results. The problem is the query
can take 15 seconds or more for broader searches. This is too long and
ideally needs to be no more than 5 seconds.

My questions are:

1) How can the above query be optimized?

The "Locations" table is not indexed. I have read that columns used in the
WHERE clause are most important for indexing.
Of course. IMO best here are two indexes on Locations:
1: on the primary key (zipcode?) and latitude
2: on the primary key and longitude.

This should cause the query optimizer to do an index scan on each of
them, then merge them (thats the important part) and then look up the
suppliers.

2) What should the index sizes be based on the # of records noted above?
As your query only has where conditions on one table, there is only
one option here. But, this or the other way, best is to have unique
indexes on the big table.

HTH, Jochen

Any advice/suggestions would be greatly appreciated.

Thanks!


--
Jochen Daum - CANS Ltd.
PHP DB Edit Toolkit -- PHP scripts for building
database editing interfaces.
http://sourceforge.net/projects/phpdbedittk/
Jul 17 '05 #2
Hi Jochen,

Thank you for your response.

My table looks like this:

LocationID
ZipCode
City
State
StatusCode
AreaCode
TimeZone
Latitude
Longitude

I indexed LocationID with Latitude and Longitude, but unfortunately it
didn't speed things up. I didn't specify an index size... perhaps that'd
help (?)

The clients wants the ability to search up to 100 miles, but the current
performance is unacceptable.

For example, I get the following results using "10010":

5 miles, 36 items returns, 3 seconds
10 miles, 77 items returns, 7 seconds
25 miles, 170 items returns, 18 seconds
50 miles, 243 items returns, 28 seconds
100 miles, 341 items returns, 62 seconds

I could break up the results in multiple pages, but they don't want to pay
for it (and I'm not working for free.)

Any other suggestions would be appreciated.

"Jochen Daum" <jo*********@cans.co.nz> wrote in message
news:ri********************************@4ax.com...
Hi !

On Thu, 05 Feb 2004 02:57:52 GMT, "Bacci" <ba***@shimmi.com> wrote:
I have two tables. The first is "Locations" which has 52,000 zip codes
w/cooresponding latitudes and longitudes. The second "Suppliers" has 2,000company names and addresses.

The user enters a zip code and the cooresponding latitude and longitude isreturn.

$sql = "SELECT l.latitude, l.longitude ";
$sql .= "FROM Locations l";
$sql .= "WHERE l.zipcode = ".$zip;

I use PHP to calculate the "high" and "low" latitude and longitude for 20
mile radius.

I then use these values to return a list of suppliers.

$sql = "SELECT s.suppliername, s.address1, s.address2, s.city, s.state,
s.zip, s.phone ";
$sql .= "FROM Suppliers s ";
$sql .= "INNER JOIN Locations l ON l.zipcode = s.zip ";
$sql .= "WHERE l.latitude <= ".$highLatitude." ";
$sql .= "AND l.latitude >= ".$lowLatitude." ";
$sql .= "AND l.longitude >= ".$lowLongitude." ";
$sql .= "AND l.longitude <= ".$highLongitude." ";
$sql .= "ORDER BY s.state, s.suppliername, s.city";

This works great and returns the desired results. The problem is the querycan take 15 seconds or more for broader searches. This is too long and
ideally needs to be no more than 5 seconds.

My questions are:

1) How can the above query be optimized?

The "Locations" table is not indexed. I have read that columns used in theWHERE clause are most important for indexing.


Of course. IMO best here are two indexes on Locations:
1: on the primary key (zipcode?) and latitude
2: on the primary key and longitude.

This should cause the query optimizer to do an index scan on each of
them, then merge them (thats the important part) and then look up the
suppliers.

2) What should the index sizes be based on the # of records noted above?


As your query only has where conditions on one table, there is only
one option here. But, this or the other way, best is to have unique
indexes on the big table.

HTH, Jochen

Any advice/suggestions would be greatly appreciated.

Thanks!


--
Jochen Daum - CANS Ltd.
PHP DB Edit Toolkit -- PHP scripts for building
database editing interfaces.
http://sourceforge.net/projects/phpdbedittk/

Jul 17 '05 #3
"Bacci" <ba***@shimmi.com> wrote in
news:QiiUb.177097$sv6.935996@attbi_s52:
I then use these values to return a list of suppliers.

$sql = "SELECT s.suppliername, s.address1, s.address2, s.city,
s.state, s.zip, s.phone ";
$sql .= "FROM Suppliers s ";
$sql .= "INNER JOIN Locations l ON l.zipcode = s.zip ";
$sql .= "WHERE l.latitude <= ".$highLatitude." ";
$sql .= "AND l.latitude >= ".$lowLatitude." ";
$sql .= "AND l.longitude >= ".$lowLongitude." ";
$sql .= "AND l.longitude <= ".$highLongitude." ";
$sql .= "ORDER BY s.state, s.suppliername, s.city";

This works great and returns the desired results. The problem is the
query can take 15 seconds or more for broader searches. This is too
long and ideally needs to be no more than 5 seconds.

My questions are:

1) How can the above query be optimized?

The "Locations" table is not indexed. I have read that columns used in
the WHERE clause are most important for indexing.


Indexing won't help you with a "range search" which is technically what
you're doing. The problem is that you have to examine every single entry
in Locations in order to determine which ones are in range. With some
tricky indexing, you *might* be able to cut the search time in about half,
but you say that you want to expand the ranges involved and so that would
still give you unacceptable search times.

What you really need is a precomputed table of distances between zip codes.
Doing that for every single zip code in the US would be unmanageable, but
fortunately the US population isn't evenly distributed; some zip codes have
a lot more people than others and therefore are more likely to be included
in requests. That means you can generate a list (as your query does) the
first time a query for a particular zip code is made, and then save it for
future use. That way queries for frequently-used zip codes will be very
fast, with the first query for an infrequently-used one taking longer (but
subsequent queries being fast).

If even that gets too big, you might do something similar but at the
regional, rather than local, level of zip codes (e.g. if you know that
someone's zip code starts with "606" (which is the case for everyone in the
city of Chicago) you can look up what other three-digit blocks of zip codes
have any locations that could be within your specified distance (i.e. if
any location in one block is within distance of any location in the other).
Doing it that way would involve slightly less than 500K entries, each of
which would be just a pair of numbers. Then you could add a zone field to
the locations table and index on it, thus limiting your final search to
ranges of zip codes rather than all of them. You'd probably have to
precompute the whole thing rather than cacheing results the way I
previously described, but it wouldn't be that hard.

The distance table would consist of pairs of three-digit zones that meet
the distance criterion, with some arbitrary (as long as it's consistent)
criterion for which goes into the first slot and which into the second (if
you have disk space to throw away, you could just make two entries for each
pair). If you do the latter, then you'd just do something like

$zone=substr($zip,0,3);
SELECT second FROM distances WHERE first=$zone;
[concatenate results into comma-separated string $zones]
SELECT s.suppliername, s.address1, s.address2, s.city,
s.state, s.zip, s.phone
FROM Suppliers s, Locations l
WHERE
INNER JOIN Locations l ON l.zipcode = s.zip
WHERE l.zone IN $zones AND l.latitude <= ".$highLatitude
AND l.latitude >= ".$lowLatitude
AND l.longitude >= ".$lowLongitude
AND l.longitude <= ".$highLongitude
ORDER BY s.state, s.suppliername, s.city;

If you cut the number of entries in half, you just have to
SELECT first,second FROM distances WHERE first=$zone OR second=$zone
and then just go through a slightly more-complicated process to build
$zones.
Jul 17 '05 #4
Eric, thanks for your detailed response. You have given me a lot to think
about.

I have another unrelated question. I need to sort the results by distance,
but my query does not calculate this. I currently calculate this at display
time to save processing. Is there a simple formula to calculate in my query?

Thanks!

"Eric Bohlman" <eb******@earthlink.net> wrote in message
news:Xn*******************************@130.133.1.1 7...
"Bacci" <ba***@shimmi.com> wrote in
news:QiiUb.177097$sv6.935996@attbi_s52:
I then use these values to return a list of suppliers.

$sql = "SELECT s.suppliername, s.address1, s.address2, s.city,
s.state, s.zip, s.phone ";
$sql .= "FROM Suppliers s ";
$sql .= "INNER JOIN Locations l ON l.zipcode = s.zip ";
$sql .= "WHERE l.latitude <= ".$highLatitude." ";
$sql .= "AND l.latitude >= ".$lowLatitude." ";
$sql .= "AND l.longitude >= ".$lowLongitude." ";
$sql .= "AND l.longitude <= ".$highLongitude." ";
$sql .= "ORDER BY s.state, s.suppliername, s.city";

This works great and returns the desired results. The problem is the
query can take 15 seconds or more for broader searches. This is too
long and ideally needs to be no more than 5 seconds.

My questions are:

1) How can the above query be optimized?

The "Locations" table is not indexed. I have read that columns used in
the WHERE clause are most important for indexing.
Indexing won't help you with a "range search" which is technically what
you're doing. The problem is that you have to examine every single entry
in Locations in order to determine which ones are in range. With some
tricky indexing, you *might* be able to cut the search time in about half,
but you say that you want to expand the ranges involved and so that would
still give you unacceptable search times.

What you really need is a precomputed table of distances between zip

codes. Doing that for every single zip code in the US would be unmanageable, but
fortunately the US population isn't evenly distributed; some zip codes have a lot more people than others and therefore are more likely to be included
in requests. That means you can generate a list (as your query does) the
first time a query for a particular zip code is made, and then save it for
future use. That way queries for frequently-used zip codes will be very
fast, with the first query for an infrequently-used one taking longer (but
subsequent queries being fast).

If even that gets too big, you might do something similar but at the
regional, rather than local, level of zip codes (e.g. if you know that
someone's zip code starts with "606" (which is the case for everyone in the city of Chicago) you can look up what other three-digit blocks of zip codes have any locations that could be within your specified distance (i.e. if
any location in one block is within distance of any location in the other). Doing it that way would involve slightly less than 500K entries, each of
which would be just a pair of numbers. Then you could add a zone field to
the locations table and index on it, thus limiting your final search to
ranges of zip codes rather than all of them. You'd probably have to
precompute the whole thing rather than cacheing results the way I
previously described, but it wouldn't be that hard.

The distance table would consist of pairs of three-digit zones that meet
the distance criterion, with some arbitrary (as long as it's consistent)
criterion for which goes into the first slot and which into the second (if
you have disk space to throw away, you could just make two entries for each pair). If you do the latter, then you'd just do something like

$zone=substr($zip,0,3);
SELECT second FROM distances WHERE first=$zone;
[concatenate results into comma-separated string $zones]
SELECT s.suppliername, s.address1, s.address2, s.city,
s.state, s.zip, s.phone
FROM Suppliers s, Locations l
WHERE
INNER JOIN Locations l ON l.zipcode = s.zip
WHERE l.zone IN $zones AND l.latitude <= ".$highLatitude
AND l.latitude >= ".$lowLatitude
AND l.longitude >= ".$lowLongitude
AND l.longitude <= ".$highLongitude
ORDER BY s.state, s.suppliername, s.city;

If you cut the number of entries in half, you just have to
SELECT first,second FROM distances WHERE first=$zone OR second=$zone
and then just go through a slightly more-complicated process to build
$zones.

Jul 17 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

13
by: aaron | last post by:
I have a question about (i think) joining. If I have a table in a database that has this info: key - name - favorite 1 - john - 2 2 - judy - 3 3 - joe - 1 the favorite icecream...
0
by: Solange Ezveff | last post by:
Object : How to optimize mysql architecture for better web connection ? Hello, I use phpMyAdmin 2.3.2 and MySQL 3.23.58 and I do not know how to well-organize mysql architecture for the user...
8
by: wlcna | last post by:
mysql v4.0.16: I had been using mysql with innodb and thought that was fine, until i used it for something requiring a few - perhaps slightly involved - joins, and have now seen the performance...
2
by: Belmin | last post by:
Hi all, Wanted to know what is the most efficient way of doing a select query for mysql that only returns one value. For example: $mysqli->query('select count(*) from log'); $temprec =...
10
by: smorrey | last post by:
Hello all, this might better be suited for the MySQL newsgroup, but I figured I'ld post here and see if anyone can help me. I'm trying to create a simple transaction handling system where users...
39
by: Mairhtin O'Feannag | last post by:
Hello, I have a client (customer) who asked the question : "Why would I buy and use UDB, when MySql is free?" I had to say I was stunned. I have no experience with MySql, so I was left sort...
12
by: WantedToBeDBA | last post by:
Hi all, db2 => create table emp(empno int not null primary key, \ db2 (cont.) => sex char(1) not null constraint s_check check \ db2 (cont.) => (sex in ('m','f')) \ db2 (cont.) => not enforced...
8
by: Fred | last post by:
Hello, Our website is currently developed in ASP/Mysql 4. The dedicated servers on which it is currently hosted arrive at saturation. Here is their configuration: - 1 server PIV 2,8Ghz 1GB...
6
by: ojorus | last post by:
Hi! My company make several flash-based games, and I use php to communicate with mysql to provide highscore-lists. My problem is this: When I save a player's score in the mysql-table, I want to...
4
by: Pseudonyme | last post by:
Dear Sirs and Madams, Receive as information that storing a MYSQL result under $_SESSION was accelerating web page displays processes ? Absolutly needed but impossible to get this working ! I...
0
by: lllomh | last post by:
Define the method first this.state = { buttonBackgroundColor: 'green', isBlinking: false, // A new status is added to identify whether the button is blinking or not } autoStart=()=>{
2
by: DJRhino | last post by:
Was curious if anyone else was having this same issue or not.... I was just Up/Down graded to windows 11 and now my access combo boxes are not acting right. With win 10 I could start typing...
0
by: Aliciasmith | last post by:
In an age dominated by smartphones, having a mobile app for your business is no longer an option; it's a necessity. Whether you're a startup or an established enterprise, finding the right mobile app...
2
by: giovanniandrean | last post by:
The energy model is structured as follows and uses excel sheets to give input data: 1-Utility.py contains all the functions needed to calculate the variables and other minor things (mentions...
4
NeoPa
by: NeoPa | last post by:
Hello everyone. I find myself stuck trying to find the VBA way to get Access to create a PDF of the currently-selected (and open) object (Form or Report). I know it can be done by selecting :...
3
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be using a very simple database which has Form (clsForm) & Report (clsReport) classes that simply handle making the calling Form invisible until the Form, or all...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 1 Nov 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM) Please note that the UK and Europe revert to winter time on...
3
by: nia12 | last post by:
Hi there, I am very new to Access so apologies if any of this is obvious/not clear. I am creating a data collection tool for health care employees to complete. It consists of a number of...
0
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be focusing on the Report (clsReport) class. This simply handles making the calling Form invisible until all of the Reports opened by it have been closed, when it...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.