473,288 Members | 1,794 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,288 software developers and data experts.

Using PHP to calculate MySQL fulltext relevancy scores by hand

Relevancy scores are normally defined by a MySQL query on a table that
has a fulltext index. The rules for relevancy scoring will exclude
certain words due to their being too short (minimum default is 4
letters).

This is the Fed. Everything is a TLA (three-letter acronym).

Therefore, since I'm building a PORTABLE web application, changing
MySQL's default settings for fulltext index querying is completely
undoable and unrealistic, so I created a "fake fulltext query", that
is, a plain query that masquerades as if it were a fulltext index
(only a lot slower but there is nothing I can do about it).

However, the client wants the exact same relevancy score as would be
found in a fulltext query. So I wrote a function that should
calculate the relevancy score based upon information provided by MySQL
AB's Sergei Golubchik. Here it is:

[PHP]
/**
* Calculate the relevancy score of a search query if the "SELECT
MATCH () AGAINST ()" fulltext indexing query is unavailable
*
* @access public
* @param object $result Resultset
* @param object $row Row of resultset to calculate relevancy
* @param mixed $keyword
* @param mixed $tableName name of table to perform counting query
to calculate $rows value
* @return float $score
* @see DBConnection
* @see MySQLQuery
* @see http://groups.google.com/groups?q=mysql+fulltext+maximum+score&hl=en&lr=&se lm=9tt6u1%24sca%241%40FreeBSD.csie.NCTU.edu.tw&rnu m=1
*/
function calculate_relevancy($result, $row, $keyword, $tableName) {
/*-------------------------------------------------------------------------------------------------------------------------
Explanation of integer variables:

1) $w1 = intermediate weight
2) $dcount = number of times word is present in document (or query
row)
3) $sumw1 = sum of all values of $w1 found
4) $uniq = number of unique words in the document (or query row)
5) $rows = total number of rows in the table
6) $found = number of rows in the table that contain the word in
question (@sizeof($result))
7) $qcount = number of times this word is present in the query
-------------------------------------------------------------------------------------------------------------------------*/

list($w1, $dcount, $sumw1, $uniq, $rows, $found, $qcount) = array(0,
0, 0, 0, 0, 0, 0);

$isSetUniq = false;

for ($i = 0; $i < @sizeof($result); $i++) {
foreach (@array_keys(get_object_vars($row)) as $field) {
$dcount += @substr_count($result[$i]->$field, $keyword);
if (!$isSetUniq) $uniq +=
@sizeof(array_flip(str_word_count($row->$field), 1));
}
$w1 = log((float)$dcount) + 1;
$sumw1Array[$i] += $w1;
$qcount += $sumw1Array[$i];
$dcount = 0;
$isSetUniq = true;
}
$found = @sizeof($result);

if (!isset($rows)) {
static $rows = 0;
global $dbHost, $dbPort, $dbUser, $dbPwd, $dbDefaultName;
$dbConnObj =& new DBConnection($dbHost, $dbPort, $dbUser, $dbPwd,
$dbDefaultName);
$dbConn = $dbConnObj->connect();
$query =& new MySQLQuery('SELECT count(r.*) AS rows FROM ' .
$this->getTempNameName() . ' r LIMIT 1', $dbConn);
$tempTableRowCountResult = $query->getResult();
$query = null;
$dbConnObj->close();
$dbConn = null;
$dbConnObj = null;
$rows = $tempTableRowCountResult[0]->rows;
}

$score = (float)($w1 / $sumw1 * $uniq / (1 + 0.0115 * $uniq) *
log((float)(($rows - $found) / $found)) * $qcount);
return $score;

}

[/PHP]

This completely fails because it produces NAN due to $sumw1 always
equaling INF (infinity). I am not math-savvy enough to know how to
fix this problem so I appeal to higher intelligence for help on this.

Thanx
Phil
Jul 17 '05 #1
2 3970
Phil Powell wrote:
Relevancy scores are normally defined by a MySQL query on a table that
has a fulltext index. The rules for relevancy scoring will exclude
certain words due to their being too short (minimum default is 4
letters).

This is the Fed. Everything is a TLA (three-letter acronym).

Therefore, since I'm building a PORTABLE web application, changing
MySQL's default settings for fulltext index querying is completely
undoable and unrealistic, so I created a "fake fulltext query", that
is, a plain query that masquerades as if it were a fulltext index
(only a lot slower but there is nothing I can do about it).

However, the client wants the exact same relevancy score as would be
found in a fulltext query. So I wrote a function that should
calculate the relevancy score based upon information provided by MySQL
AB's Sergei Golubchik. Here it is:

[PHP]
<snip code>
$score = (float)($w1 / $sumw1 * $uniq / (1 + 0.0115 * $uniq) *
log((float)(($rows - $found) / $found)) * $qcount);


try replacing the $sumw1 with:

array_sum($sumw1Array)

--
Justin Koivisto - sp**@koivi.com
http://www.koivi.com
Jul 17 '05 #2
Justin Koivisto <sp**@koivi.com> wrote in message news:<VY****************@news7.onvoy.net>...
Phil Powell wrote:
Relevancy scores are normally defined by a MySQL query on a table that
has a fulltext index. The rules for relevancy scoring will exclude
certain words due to their being too short (minimum default is 4
letters).

This is the Fed. Everything is a TLA (three-letter acronym).

Therefore, since I'm building a PORTABLE web application, changing
MySQL's default settings for fulltext index querying is completely
undoable and unrealistic, so I created a "fake fulltext query", that
is, a plain query that masquerades as if it were a fulltext index
(only a lot slower but there is nothing I can do about it).

However, the client wants the exact same relevancy score as would be
found in a fulltext query. So I wrote a function that should
calculate the relevancy score based upon information provided by MySQL
AB's Sergei Golubchik. Here it is:

[PHP]


<snip code>
$score = (float)($w1 / $sumw1 * $uniq / (1 + 0.0115 * $uniq) *
log((float)(($rows - $found) / $found)) * $qcount);


try replacing the $sumw1 with:

array_sum($sumw1Array)


I had the formula interpretation wrong. $sumw1 is the sum of all $w1
values which are the weights of each word that comprises the compound
keyword, for example, if your keyword is "Hello World" then you have
$w1 for "Hello" and $w1 for "World".

Phil
Jul 17 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: JT | last post by:
I have a MySQL fulltext search form in place and I now want to filter the results further. I've added a few multiple select boxes on my form (ResourceType and Topic), and I'm able to build and...
6
by: James Turner | last post by:
I am trying to store formatted text (windows format) into a MySQL database and then retrieve it. The field in the database is a varchar. I cut and paste the test into a form field formatted, then...
2
by: Maziar Aflatoun | last post by:
Hi, I have the following table CREATE TABLE `fulltext_sample` ( `copy` text, FULLTEXT KEY `copy` (`copy`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1; INSERT INTO `fulltext_sample` VALUES...
0
by: Henry Hank | last post by:
Environment: I'm setting up a database server on a Dell Poweredge 2650, dual 1.8GHZ pentium with 1GB of memory and RAID5 drives. I've installed RedHat 9, and updated the kernel to 2.4.20-19.9smp....
0
by: Phil Powell | last post by:
Retracing my problem leads me to believe I never successfully created fulltext indexes for MySQL 3.23.58 MyISAM tables. I went to the MySQL manual and was able - or so I thought - to create them,...
9
by: chromis | last post by:
Hi, I've never coded in ASP before and I'm trying to port a couple of simple PHP files to ASP.NET. The first file addScores.php takes form data and a hash and inserts the data into the db, it...
1
by: chromis | last post by:
Hi, I wouldn't post such a lazy question usually but I've got to have this sorted for the end of the day, sorry! Could someone tell me how to write this statement in MSSQL please? ...
4
by: pavanip | last post by:
Hi, I am developing one application in that i have to calculate bank account credit scores.Please anyone give me the suggestion about how to calculate credit scores and how to verify bank account...
6
by: AshleyB | last post by:
2 phpbb boards I run have been hacked. One I've restored but the other one has had some damage to the MySQL db. A table is missing. As it's a styles table I thought I could could copy a table,...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 7 Feb 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:30 (7.30PM). In this month's session, the creator of the excellent VBE...
0
by: MeoLessi9 | last post by:
I have VirtualBox installed on Windows 11 and now I would like to install Kali on a virtual machine. However, on the official website, I see two options: "Installer images" and "Virtual machines"....
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: Aftab Ahmad | last post by:
So, I have written a code for a cmd called "Send WhatsApp Message" to open and send WhatsApp messaage. The code is given below. Dim IE As Object Set IE =...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.