By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
432,175 Members | 1,710 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 432,175 IT Pros & Developers. It's quick & easy.

Using PHP to calculate MySQL fulltext relevancy scores by hand

P: n/a
Relevancy scores are normally defined by a MySQL query on a table that
has a fulltext index. The rules for relevancy scoring will exclude
certain words due to their being too short (minimum default is 4
letters).

This is the Fed. Everything is a TLA (three-letter acronym).

Therefore, since I'm building a PORTABLE web application, changing
MySQL's default settings for fulltext index querying is completely
undoable and unrealistic, so I created a "fake fulltext query", that
is, a plain query that masquerades as if it were a fulltext index
(only a lot slower but there is nothing I can do about it).

However, the client wants the exact same relevancy score as would be
found in a fulltext query. So I wrote a function that should
calculate the relevancy score based upon information provided by MySQL
AB's Sergei Golubchik. Here it is:

[PHP]
/**
* Calculate the relevancy score of a search query if the "SELECT
MATCH () AGAINST ()" fulltext indexing query is unavailable
*
* @access public
* @param object $result Resultset
* @param object $row Row of resultset to calculate relevancy
* @param mixed $keyword
* @param mixed $tableName name of table to perform counting query
to calculate $rows value
* @return float $score
* @see DBConnection
* @see MySQLQuery
* @see http://groups.google.com/groups?q=mysql+fulltext+maximum+score&hl=en&lr=&se lm=9tt6u1%24sca%241%40FreeBSD.csie.NCTU.edu.tw&rnu m=1
*/
function calculate_relevancy($result, $row, $keyword, $tableName) {
/*-------------------------------------------------------------------------------------------------------------------------
Explanation of integer variables:

1) $w1 = intermediate weight
2) $dcount = number of times word is present in document (or query
row)
3) $sumw1 = sum of all values of $w1 found
4) $uniq = number of unique words in the document (or query row)
5) $rows = total number of rows in the table
6) $found = number of rows in the table that contain the word in
question (@sizeof($result))
7) $qcount = number of times this word is present in the query
-------------------------------------------------------------------------------------------------------------------------*/

list($w1, $dcount, $sumw1, $uniq, $rows, $found, $qcount) = array(0,
0, 0, 0, 0, 0, 0);

$isSetUniq = false;

for ($i = 0; $i < @sizeof($result); $i++) {
foreach (@array_keys(get_object_vars($row)) as $field) {
$dcount += @substr_count($result[$i]->$field, $keyword);
if (!$isSetUniq) $uniq +=
@sizeof(array_flip(str_word_count($row->$field), 1));
}
$w1 = log((float)$dcount) + 1;
$sumw1Array[$i] += $w1;
$qcount += $sumw1Array[$i];
$dcount = 0;
$isSetUniq = true;
}
$found = @sizeof($result);

if (!isset($rows)) {
static $rows = 0;
global $dbHost, $dbPort, $dbUser, $dbPwd, $dbDefaultName;
$dbConnObj =& new DBConnection($dbHost, $dbPort, $dbUser, $dbPwd,
$dbDefaultName);
$dbConn = $dbConnObj->connect();
$query =& new MySQLQuery('SELECT count(r.*) AS rows FROM ' .
$this->getTempNameName() . ' r LIMIT 1', $dbConn);
$tempTableRowCountResult = $query->getResult();
$query = null;
$dbConnObj->close();
$dbConn = null;
$dbConnObj = null;
$rows = $tempTableRowCountResult[0]->rows;
}

$score = (float)($w1 / $sumw1 * $uniq / (1 + 0.0115 * $uniq) *
log((float)(($rows - $found) / $found)) * $qcount);
return $score;

}

[/PHP]

This completely fails because it produces NAN due to $sumw1 always
equaling INF (infinity). I am not math-savvy enough to know how to
fix this problem so I appeal to higher intelligence for help on this.

Thanx
Phil
Jul 17 '05 #1
Share this Question
Share on Google+
2 Replies


P: n/a
Phil Powell wrote:
Relevancy scores are normally defined by a MySQL query on a table that
has a fulltext index. The rules for relevancy scoring will exclude
certain words due to their being too short (minimum default is 4
letters).

This is the Fed. Everything is a TLA (three-letter acronym).

Therefore, since I'm building a PORTABLE web application, changing
MySQL's default settings for fulltext index querying is completely
undoable and unrealistic, so I created a "fake fulltext query", that
is, a plain query that masquerades as if it were a fulltext index
(only a lot slower but there is nothing I can do about it).

However, the client wants the exact same relevancy score as would be
found in a fulltext query. So I wrote a function that should
calculate the relevancy score based upon information provided by MySQL
AB's Sergei Golubchik. Here it is:

[PHP]
<snip code>
$score = (float)($w1 / $sumw1 * $uniq / (1 + 0.0115 * $uniq) *
log((float)(($rows - $found) / $found)) * $qcount);


try replacing the $sumw1 with:

array_sum($sumw1Array)

--
Justin Koivisto - sp**@koivi.com
http://www.koivi.com
Jul 17 '05 #2

P: n/a
Justin Koivisto <sp**@koivi.com> wrote in message news:<VY****************@news7.onvoy.net>...
Phil Powell wrote:
Relevancy scores are normally defined by a MySQL query on a table that
has a fulltext index. The rules for relevancy scoring will exclude
certain words due to their being too short (minimum default is 4
letters).

This is the Fed. Everything is a TLA (three-letter acronym).

Therefore, since I'm building a PORTABLE web application, changing
MySQL's default settings for fulltext index querying is completely
undoable and unrealistic, so I created a "fake fulltext query", that
is, a plain query that masquerades as if it were a fulltext index
(only a lot slower but there is nothing I can do about it).

However, the client wants the exact same relevancy score as would be
found in a fulltext query. So I wrote a function that should
calculate the relevancy score based upon information provided by MySQL
AB's Sergei Golubchik. Here it is:

[PHP]


<snip code>
$score = (float)($w1 / $sumw1 * $uniq / (1 + 0.0115 * $uniq) *
log((float)(($rows - $found) / $found)) * $qcount);


try replacing the $sumw1 with:

array_sum($sumw1Array)


I had the formula interpretation wrong. $sumw1 is the sum of all $w1
values which are the weights of each word that comprises the compound
keyword, for example, if your keyword is "Hello World" then you have
$w1 for "Hello" and $w1 for "World".

Phil
Jul 17 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.