Relevancy scores are normally defined by a MySQL query on a table that
has a fulltext index. The rules for relevancy scoring will exclude
certain words due to their being too short (minimum default is 4
letters).
This is the Fed. Everything is a TLA (three-letter acronym).
Therefore, since I'm building a PORTABLE web application, changing
MySQL's default settings for fulltext index querying is completely
undoable and unrealistic, so I created a "fake fulltext query", that
is, a plain query that masquerades as if it were a fulltext index
(only a lot slower but there is nothing I can do about it).
However, the client wants the exact same relevancy score as would be
found in a fulltext query. So I wrote a function that should
calculate the relevancy score based upon information provided by MySQL
AB's Sergei Golubchik. Here it is:
[PHP]
/**
* Calculate the relevancy score of a search query if the "SELECT
MATCH () AGAINST ()" fulltext indexing query is unavailable
*
* @access public
* @param object $result Resultset
* @param object $row Row of resultset to calculate relevancy
* @param mixed $keyword
* @param mixed $tableName name of table to perform counting query
to calculate $rows value
* @return float $score
* @see DBConnection
* @see MySQLQuery
* @see http://groups.google.com/groups?q=mysql+fulltext+maximum+score&hl=en&lr=&se lm=9tt6u1%24sca%241%40FreeBSD.csie.NCTU.edu.tw&rnu m=1
*/
function calculate_relevancy($result, $row, $keyword, $tableName) {
/*-------------------------------------------------------------------------------------------------------------------------
Explanation of integer variables:
1) $w1 = intermediate weight
2) $dcount = number of times word is present in document (or query
row)
3) $sumw1 = sum of all values of $w1 found
4) $uniq = number of unique words in the document (or query row)
5) $rows = total number of rows in the table
6) $found = number of rows in the table that contain the word in
question (@sizeof($result))
7) $qcount = number of times this word is present in the query
-------------------------------------------------------------------------------------------------------------------------*/
list($w1, $dcount, $sumw1, $uniq, $rows, $found, $qcount) = array(0,
0, 0, 0, 0, 0, 0);
$isSetUniq = false;
for ($i = 0; $i < @sizeof($result); $i++) {
foreach (@array_keys(get_object_vars($row)) as $field) {
$dcount += @substr_count($result[$i]->$field, $keyword);
if (!$isSetUniq) $uniq +=
@sizeof(array_flip(str_word_count($row->$field), 1));
}
$w1 = log((float)$dcount) + 1;
$sumw1Array[$i] += $w1;
$qcount += $sumw1Array[$i];
$dcount = 0;
$isSetUniq = true;
}
$found = @sizeof($result);
if (!isset($rows)) {
static $rows = 0;
global $dbHost, $dbPort, $dbUser, $dbPwd, $dbDefaultName;
$dbConnObj =& new DBConnection($dbHost, $dbPort, $dbUser, $dbPwd,
$dbDefaultName);
$dbConn = $dbConnObj->connect();
$query =& new MySQLQuery('SELECT count(r.*) AS rows FROM ' .
$this->getTempNameName() . ' r LIMIT 1', $dbConn);
$tempTableRowCountResult = $query->getResult();
$query = null;
$dbConnObj->close();
$dbConn = null;
$dbConnObj = null;
$rows = $tempTableRowCountResult[0]->rows;
}
$score = (float)($w1 / $sumw1 * $uniq / (1 + 0.0115 * $uniq) *
log((float)(($rows - $found) / $found)) * $qcount);
return $score;
}
[/PHP]
This completely fails because it produces NAN due to $sumw1 always
equaling INF (infinity). I am not math-savvy enough to know how to
fix this problem so I appeal to higher intelligence for help on this.
Thanx
Phil