471,316 Members | 1,325 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,316 software developers and data experts.

MySQL Query Optimization

I have two tables. The first is "Locations" which has 52,000 zip codes
w/cooresponding latitudes and longitudes. The second "Suppliers" has 2,000
company names and addresses.

The user enters a zip code and the cooresponding latitude and longitude is
return.

$sql = "SELECT l.latitude, l.longitude ";
$sql .= "FROM Locations l";
$sql .= "WHERE l.zipcode = ".$zip;

I use PHP to calculate the "high" and "low" latitude and longitude for 20
mile radius.

I then use these values to return a list of suppliers.

$sql = "SELECT s.suppliername, s.address1, s.address2, s.city, s.state,
s.zip, s.phone ";
$sql .= "FROM Suppliers s ";
$sql .= "INNER JOIN Locations l ON l.zipcode = s.zip ";
$sql .= "WHERE l.latitude <= ".$highLatitude." ";
$sql .= "AND l.latitude >= ".$lowLatitude." ";
$sql .= "AND l.longitude >= ".$lowLongitude." ";
$sql .= "AND l.longitude <= ".$highLongitude." ";
$sql .= "ORDER BY s.state, s.suppliername, s.city";

This works great and returns the desired results. The problem is the query
can take 15 seconds or more for broader searches. This is too long and
ideally needs to be no more than 5 seconds.

My questions are:

1) How can the above query be optimized?

The "Locations" table is not indexed. I have read that columns used in the
WHERE clause are most important for indexing.

2) What should the index sizes be based on the # of records noted above?

Any advice/suggestions would be greatly appreciated.

Thanks!
Jul 17 '05 #1
4 2267
Hi !

On Thu, 05 Feb 2004 02:57:52 GMT, "Bacci" <ba***@shimmi.com> wrote:
I have two tables. The first is "Locations" which has 52,000 zip codes
w/cooresponding latitudes and longitudes. The second "Suppliers" has 2,000
company names and addresses.

The user enters a zip code and the cooresponding latitude and longitude is
return.

$sql = "SELECT l.latitude, l.longitude ";
$sql .= "FROM Locations l";
$sql .= "WHERE l.zipcode = ".$zip;

I use PHP to calculate the "high" and "low" latitude and longitude for 20
mile radius.

I then use these values to return a list of suppliers.

$sql = "SELECT s.suppliername, s.address1, s.address2, s.city, s.state,
s.zip, s.phone ";
$sql .= "FROM Suppliers s ";
$sql .= "INNER JOIN Locations l ON l.zipcode = s.zip ";
$sql .= "WHERE l.latitude <= ".$highLatitude." ";
$sql .= "AND l.latitude >= ".$lowLatitude." ";
$sql .= "AND l.longitude >= ".$lowLongitude." ";
$sql .= "AND l.longitude <= ".$highLongitude." ";
$sql .= "ORDER BY s.state, s.suppliername, s.city";

This works great and returns the desired results. The problem is the query
can take 15 seconds or more for broader searches. This is too long and
ideally needs to be no more than 5 seconds.

My questions are:

1) How can the above query be optimized?

The "Locations" table is not indexed. I have read that columns used in the
WHERE clause are most important for indexing.
Of course. IMO best here are two indexes on Locations:
1: on the primary key (zipcode?) and latitude
2: on the primary key and longitude.

This should cause the query optimizer to do an index scan on each of
them, then merge them (thats the important part) and then look up the
suppliers.

2) What should the index sizes be based on the # of records noted above?
As your query only has where conditions on one table, there is only
one option here. But, this or the other way, best is to have unique
indexes on the big table.

HTH, Jochen

Any advice/suggestions would be greatly appreciated.

Thanks!


--
Jochen Daum - CANS Ltd.
PHP DB Edit Toolkit -- PHP scripts for building
database editing interfaces.
http://sourceforge.net/projects/phpdbedittk/
Jul 17 '05 #2
Hi Jochen,

Thank you for your response.

My table looks like this:

LocationID
ZipCode
City
State
StatusCode
AreaCode
TimeZone
Latitude
Longitude

I indexed LocationID with Latitude and Longitude, but unfortunately it
didn't speed things up. I didn't specify an index size... perhaps that'd
help (?)

The clients wants the ability to search up to 100 miles, but the current
performance is unacceptable.

For example, I get the following results using "10010":

5 miles, 36 items returns, 3 seconds
10 miles, 77 items returns, 7 seconds
25 miles, 170 items returns, 18 seconds
50 miles, 243 items returns, 28 seconds
100 miles, 341 items returns, 62 seconds

I could break up the results in multiple pages, but they don't want to pay
for it (and I'm not working for free.)

Any other suggestions would be appreciated.

"Jochen Daum" <jo*********@cans.co.nz> wrote in message
news:ri********************************@4ax.com...
Hi !

On Thu, 05 Feb 2004 02:57:52 GMT, "Bacci" <ba***@shimmi.com> wrote:
I have two tables. The first is "Locations" which has 52,000 zip codes
w/cooresponding latitudes and longitudes. The second "Suppliers" has 2,000company names and addresses.

The user enters a zip code and the cooresponding latitude and longitude isreturn.

$sql = "SELECT l.latitude, l.longitude ";
$sql .= "FROM Locations l";
$sql .= "WHERE l.zipcode = ".$zip;

I use PHP to calculate the "high" and "low" latitude and longitude for 20
mile radius.

I then use these values to return a list of suppliers.

$sql = "SELECT s.suppliername, s.address1, s.address2, s.city, s.state,
s.zip, s.phone ";
$sql .= "FROM Suppliers s ";
$sql .= "INNER JOIN Locations l ON l.zipcode = s.zip ";
$sql .= "WHERE l.latitude <= ".$highLatitude." ";
$sql .= "AND l.latitude >= ".$lowLatitude." ";
$sql .= "AND l.longitude >= ".$lowLongitude." ";
$sql .= "AND l.longitude <= ".$highLongitude." ";
$sql .= "ORDER BY s.state, s.suppliername, s.city";

This works great and returns the desired results. The problem is the querycan take 15 seconds or more for broader searches. This is too long and
ideally needs to be no more than 5 seconds.

My questions are:

1) How can the above query be optimized?

The "Locations" table is not indexed. I have read that columns used in theWHERE clause are most important for indexing.


Of course. IMO best here are two indexes on Locations:
1: on the primary key (zipcode?) and latitude
2: on the primary key and longitude.

This should cause the query optimizer to do an index scan on each of
them, then merge them (thats the important part) and then look up the
suppliers.

2) What should the index sizes be based on the # of records noted above?


As your query only has where conditions on one table, there is only
one option here. But, this or the other way, best is to have unique
indexes on the big table.

HTH, Jochen

Any advice/suggestions would be greatly appreciated.

Thanks!


--
Jochen Daum - CANS Ltd.
PHP DB Edit Toolkit -- PHP scripts for building
database editing interfaces.
http://sourceforge.net/projects/phpdbedittk/

Jul 17 '05 #3
"Bacci" <ba***@shimmi.com> wrote in
news:QiiUb.177097$sv6.935996@attbi_s52:
I then use these values to return a list of suppliers.

$sql = "SELECT s.suppliername, s.address1, s.address2, s.city,
s.state, s.zip, s.phone ";
$sql .= "FROM Suppliers s ";
$sql .= "INNER JOIN Locations l ON l.zipcode = s.zip ";
$sql .= "WHERE l.latitude <= ".$highLatitude." ";
$sql .= "AND l.latitude >= ".$lowLatitude." ";
$sql .= "AND l.longitude >= ".$lowLongitude." ";
$sql .= "AND l.longitude <= ".$highLongitude." ";
$sql .= "ORDER BY s.state, s.suppliername, s.city";

This works great and returns the desired results. The problem is the
query can take 15 seconds or more for broader searches. This is too
long and ideally needs to be no more than 5 seconds.

My questions are:

1) How can the above query be optimized?

The "Locations" table is not indexed. I have read that columns used in
the WHERE clause are most important for indexing.


Indexing won't help you with a "range search" which is technically what
you're doing. The problem is that you have to examine every single entry
in Locations in order to determine which ones are in range. With some
tricky indexing, you *might* be able to cut the search time in about half,
but you say that you want to expand the ranges involved and so that would
still give you unacceptable search times.

What you really need is a precomputed table of distances between zip codes.
Doing that for every single zip code in the US would be unmanageable, but
fortunately the US population isn't evenly distributed; some zip codes have
a lot more people than others and therefore are more likely to be included
in requests. That means you can generate a list (as your query does) the
first time a query for a particular zip code is made, and then save it for
future use. That way queries for frequently-used zip codes will be very
fast, with the first query for an infrequently-used one taking longer (but
subsequent queries being fast).

If even that gets too big, you might do something similar but at the
regional, rather than local, level of zip codes (e.g. if you know that
someone's zip code starts with "606" (which is the case for everyone in the
city of Chicago) you can look up what other three-digit blocks of zip codes
have any locations that could be within your specified distance (i.e. if
any location in one block is within distance of any location in the other).
Doing it that way would involve slightly less than 500K entries, each of
which would be just a pair of numbers. Then you could add a zone field to
the locations table and index on it, thus limiting your final search to
ranges of zip codes rather than all of them. You'd probably have to
precompute the whole thing rather than cacheing results the way I
previously described, but it wouldn't be that hard.

The distance table would consist of pairs of three-digit zones that meet
the distance criterion, with some arbitrary (as long as it's consistent)
criterion for which goes into the first slot and which into the second (if
you have disk space to throw away, you could just make two entries for each
pair). If you do the latter, then you'd just do something like

$zone=substr($zip,0,3);
SELECT second FROM distances WHERE first=$zone;
[concatenate results into comma-separated string $zones]
SELECT s.suppliername, s.address1, s.address2, s.city,
s.state, s.zip, s.phone
FROM Suppliers s, Locations l
WHERE
INNER JOIN Locations l ON l.zipcode = s.zip
WHERE l.zone IN $zones AND l.latitude <= ".$highLatitude
AND l.latitude >= ".$lowLatitude
AND l.longitude >= ".$lowLongitude
AND l.longitude <= ".$highLongitude
ORDER BY s.state, s.suppliername, s.city;

If you cut the number of entries in half, you just have to
SELECT first,second FROM distances WHERE first=$zone OR second=$zone
and then just go through a slightly more-complicated process to build
$zones.
Jul 17 '05 #4
Eric, thanks for your detailed response. You have given me a lot to think
about.

I have another unrelated question. I need to sort the results by distance,
but my query does not calculate this. I currently calculate this at display
time to save processing. Is there a simple formula to calculate in my query?

Thanks!

"Eric Bohlman" <eb******@earthlink.net> wrote in message
news:Xn*******************************@130.133.1.1 7...
"Bacci" <ba***@shimmi.com> wrote in
news:QiiUb.177097$sv6.935996@attbi_s52:
I then use these values to return a list of suppliers.

$sql = "SELECT s.suppliername, s.address1, s.address2, s.city,
s.state, s.zip, s.phone ";
$sql .= "FROM Suppliers s ";
$sql .= "INNER JOIN Locations l ON l.zipcode = s.zip ";
$sql .= "WHERE l.latitude <= ".$highLatitude." ";
$sql .= "AND l.latitude >= ".$lowLatitude." ";
$sql .= "AND l.longitude >= ".$lowLongitude." ";
$sql .= "AND l.longitude <= ".$highLongitude." ";
$sql .= "ORDER BY s.state, s.suppliername, s.city";

This works great and returns the desired results. The problem is the
query can take 15 seconds or more for broader searches. This is too
long and ideally needs to be no more than 5 seconds.

My questions are:

1) How can the above query be optimized?

The "Locations" table is not indexed. I have read that columns used in
the WHERE clause are most important for indexing.
Indexing won't help you with a "range search" which is technically what
you're doing. The problem is that you have to examine every single entry
in Locations in order to determine which ones are in range. With some
tricky indexing, you *might* be able to cut the search time in about half,
but you say that you want to expand the ranges involved and so that would
still give you unacceptable search times.

What you really need is a precomputed table of distances between zip

codes. Doing that for every single zip code in the US would be unmanageable, but
fortunately the US population isn't evenly distributed; some zip codes have a lot more people than others and therefore are more likely to be included
in requests. That means you can generate a list (as your query does) the
first time a query for a particular zip code is made, and then save it for
future use. That way queries for frequently-used zip codes will be very
fast, with the first query for an infrequently-used one taking longer (but
subsequent queries being fast).

If even that gets too big, you might do something similar but at the
regional, rather than local, level of zip codes (e.g. if you know that
someone's zip code starts with "606" (which is the case for everyone in the city of Chicago) you can look up what other three-digit blocks of zip codes have any locations that could be within your specified distance (i.e. if
any location in one block is within distance of any location in the other). Doing it that way would involve slightly less than 500K entries, each of
which would be just a pair of numbers. Then you could add a zone field to
the locations table and index on it, thus limiting your final search to
ranges of zip codes rather than all of them. You'd probably have to
precompute the whole thing rather than cacheing results the way I
previously described, but it wouldn't be that hard.

The distance table would consist of pairs of three-digit zones that meet
the distance criterion, with some arbitrary (as long as it's consistent)
criterion for which goes into the first slot and which into the second (if
you have disk space to throw away, you could just make two entries for each pair). If you do the latter, then you'd just do something like

$zone=substr($zip,0,3);
SELECT second FROM distances WHERE first=$zone;
[concatenate results into comma-separated string $zones]
SELECT s.suppliername, s.address1, s.address2, s.city,
s.state, s.zip, s.phone
FROM Suppliers s, Locations l
WHERE
INNER JOIN Locations l ON l.zipcode = s.zip
WHERE l.zone IN $zones AND l.latitude <= ".$highLatitude
AND l.latitude >= ".$lowLatitude
AND l.longitude >= ".$lowLongitude
AND l.longitude <= ".$highLongitude
ORDER BY s.state, s.suppliername, s.city;

If you cut the number of entries in half, you just have to
SELECT first,second FROM distances WHERE first=$zone OR second=$zone
and then just go through a slightly more-complicated process to build
$zones.

Jul 17 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

13 posts views Thread by aaron | last post: by
reply views Thread by Solange Ezveff | last post: by
8 posts views Thread by wlcna | last post: by
10 posts views Thread by smorrey | last post: by
39 posts views Thread by Mairhtin O'Feannag | last post: by
6 posts views Thread by ojorus | last post: by
4 posts views Thread by Pseudonyme | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.