"Bacci" <ba***@shimmi.com> wrote in
news:QiiUb.177097$sv6.935996@attbi_s52:
I then use these values to return a list of suppliers.
$sql = "SELECT s.suppliername, s.address1, s.address2, s.city,
s.state, s.zip, s.phone ";
$sql .= "FROM Suppliers s ";
$sql .= "INNER JOIN Locations l ON l.zipcode = s.zip ";
$sql .= "WHERE l.latitude <= ".$highLatitude." ";
$sql .= "AND l.latitude >= ".$lowLatitude." ";
$sql .= "AND l.longitude >= ".$lowLongitude." ";
$sql .= "AND l.longitude <= ".$highLongitude." ";
$sql .= "ORDER BY s.state, s.suppliername, s.city";
This works great and returns the desired results. The problem is the
query can take 15 seconds or more for broader searches. This is too
long and ideally needs to be no more than 5 seconds.
My questions are:
1) How can the above query be optimized?
The "Locations" table is not indexed. I have read that columns used in
the WHERE clause are most important for indexing.
Indexing won't help you with a "range search" which is technically what
you're doing. The problem is that you have to examine every single entry
in Locations in order to determine which ones are in range. With some
tricky indexing, you *might* be able to cut the search time in about half,
but you say that you want to expand the ranges involved and so that would
still give you unacceptable search times.
What you really need is a precomputed table of distances between zip codes.
Doing that for every single zip code in the US would be unmanageable, but
fortunately the US population isn't evenly distributed; some zip codes have
a lot more people than others and therefore are more likely to be included
in requests. That means you can generate a list (as your query does) the
first time a query for a particular zip code is made, and then save it for
future use. That way queries for frequently-used zip codes will be very
fast, with the first query for an infrequently-used one taking longer (but
subsequent queries being fast).
If even that gets too big, you might do something similar but at the
regional, rather than local, level of zip codes (e.g. if you know that
someone's zip code starts with "606" (which is the case for everyone in the
city of Chicago) you can look up what other three-digit blocks of zip codes
have any locations that could be within your specified distance (i.e. if
any location in one block is within distance of any location in the other).
Doing it that way would involve slightly less than 500K entries, each of
which would be just a pair of numbers. Then you could add a zone field to
the locations table and index on it, thus limiting your final search to
ranges of zip codes rather than all of them. You'd probably have to
precompute the whole thing rather than cacheing results the way I
previously described, but it wouldn't be that hard.
The distance table would consist of pairs of three-digit zones that meet
the distance criterion, with some arbitrary (as long as it's consistent)
criterion for which goes into the first slot and which into the second (if
you have disk space to throw away, you could just make two entries for each
pair). If you do the latter, then you'd just do something like
$zone=substr($zip,0,3);
SELECT second FROM distances WHERE first=$zone;
[concatenate results into comma-separated string $zones]
SELECT s.suppliername, s.address1, s.address2, s.city,
s.state, s.zip, s.phone
FROM Suppliers s, Locations l
WHERE
INNER JOIN Locations l ON l.zipcode = s.zip
WHERE l.zone IN $zones AND l.latitude <= ".$highLatitude
AND l.latitude >= ".$lowLatitude
AND l.longitude >= ".$lowLongitude
AND l.longitude <= ".$highLongitude
ORDER BY s.state, s.suppliername, s.city;
If you cut the number of entries in half, you just have to
SELECT first,second FROM distances WHERE first=$zone OR second=$zone
and then just go through a slightly more-complicated process to build
$zones.