473,386 Members | 1,786 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Using PHP to calculate MySQL fulltext relevancy scores by hand

Relevancy scores are normally defined by a MySQL query on a table that
has a fulltext index. The rules for relevancy scoring will exclude
certain words due to their being too short (minimum default is 4
letters).

This is the Fed. Everything is a TLA (three-letter acronym).

Therefore, since I'm building a PORTABLE web application, changing
MySQL's default settings for fulltext index querying is completely
undoable and unrealistic, so I created a "fake fulltext query", that
is, a plain query that masquerades as if it were a fulltext index
(only a lot slower but there is nothing I can do about it).

However, the client wants the exact same relevancy score as would be
found in a fulltext query. So I wrote a function that should
calculate the relevancy score based upon information provided by MySQL
AB's Sergei Golubchik. Here it is:

[PHP]
/**
* Calculate the relevancy score of a search query if the "SELECT
MATCH () AGAINST ()" fulltext indexing query is unavailable
*
* @access public
* @param object $result Resultset
* @param object $row Row of resultset to calculate relevancy
* @param mixed $keyword
* @param mixed $tableName name of table to perform counting query
to calculate $rows value
* @return float $score
* @see DBConnection
* @see MySQLQuery
* @see http://groups.google.com/groups?q=mysql+fulltext+maximum+score&hl=en&lr=&se lm=9tt6u1%24sca%241%40FreeBSD.csie.NCTU.edu.tw&rnu m=1
*/
function calculate_relevancy($result, $row, $keyword, $tableName) {
/*-------------------------------------------------------------------------------------------------------------------------
Explanation of integer variables:

1) $w1 = intermediate weight
2) $dcount = number of times word is present in document (or query
row)
3) $sumw1 = sum of all values of $w1 found
4) $uniq = number of unique words in the document (or query row)
5) $rows = total number of rows in the table
6) $found = number of rows in the table that contain the word in
question (@sizeof($result))
7) $qcount = number of times this word is present in the query
-------------------------------------------------------------------------------------------------------------------------*/

list($w1, $dcount, $sumw1, $uniq, $rows, $found, $qcount) = array(0,
0, 0, 0, 0, 0, 0);

$isSetUniq = false;

for ($i = 0; $i < @sizeof($result); $i++) {
foreach (@array_keys(get_object_vars($row)) as $field) {
$dcount += @substr_count($result[$i]->$field, $keyword);
if (!$isSetUniq) $uniq +=
@sizeof(array_flip(str_word_count($row->$field), 1));
}
$w1 = log((float)$dcount) + 1;
$sumw1Array[$i] += $w1;
$qcount += $sumw1Array[$i];
$dcount = 0;
$isSetUniq = true;
}
$found = @sizeof($result);

if (!isset($rows)) {
static $rows = 0;
global $dbHost, $dbPort, $dbUser, $dbPwd, $dbDefaultName;
$dbConnObj =& new DBConnection($dbHost, $dbPort, $dbUser, $dbPwd,
$dbDefaultName);
$dbConn = $dbConnObj->connect();
$query =& new MySQLQuery('SELECT count(r.*) AS rows FROM ' .
$this->getTempNameName() . ' r LIMIT 1', $dbConn);
$tempTableRowCountResult = $query->getResult();
$query = null;
$dbConnObj->close();
$dbConn = null;
$dbConnObj = null;
$rows = $tempTableRowCountResult[0]->rows;
}

$score = (float)($w1 / $sumw1 * $uniq / (1 + 0.0115 * $uniq) *
log((float)(($rows - $found) / $found)) * $qcount);
return $score;

}

[/PHP]

This completely fails because it produces NAN due to $sumw1 always
equaling INF (infinity). I am not math-savvy enough to know how to
fix this problem so I appeal to higher intelligence for help on this.

Thanx
Phil
Jul 17 '05 #1
2 3979
Phil Powell wrote:
Relevancy scores are normally defined by a MySQL query on a table that
has a fulltext index. The rules for relevancy scoring will exclude
certain words due to their being too short (minimum default is 4
letters).

This is the Fed. Everything is a TLA (three-letter acronym).

Therefore, since I'm building a PORTABLE web application, changing
MySQL's default settings for fulltext index querying is completely
undoable and unrealistic, so I created a "fake fulltext query", that
is, a plain query that masquerades as if it were a fulltext index
(only a lot slower but there is nothing I can do about it).

However, the client wants the exact same relevancy score as would be
found in a fulltext query. So I wrote a function that should
calculate the relevancy score based upon information provided by MySQL
AB's Sergei Golubchik. Here it is:

[PHP]
<snip code>
$score = (float)($w1 / $sumw1 * $uniq / (1 + 0.0115 * $uniq) *
log((float)(($rows - $found) / $found)) * $qcount);


try replacing the $sumw1 with:

array_sum($sumw1Array)

--
Justin Koivisto - sp**@koivi.com
http://www.koivi.com
Jul 17 '05 #2
Justin Koivisto <sp**@koivi.com> wrote in message news:<VY****************@news7.onvoy.net>...
Phil Powell wrote:
Relevancy scores are normally defined by a MySQL query on a table that
has a fulltext index. The rules for relevancy scoring will exclude
certain words due to their being too short (minimum default is 4
letters).

This is the Fed. Everything is a TLA (three-letter acronym).

Therefore, since I'm building a PORTABLE web application, changing
MySQL's default settings for fulltext index querying is completely
undoable and unrealistic, so I created a "fake fulltext query", that
is, a plain query that masquerades as if it were a fulltext index
(only a lot slower but there is nothing I can do about it).

However, the client wants the exact same relevancy score as would be
found in a fulltext query. So I wrote a function that should
calculate the relevancy score based upon information provided by MySQL
AB's Sergei Golubchik. Here it is:

[PHP]


<snip code>
$score = (float)($w1 / $sumw1 * $uniq / (1 + 0.0115 * $uniq) *
log((float)(($rows - $found) / $found)) * $qcount);


try replacing the $sumw1 with:

array_sum($sumw1Array)


I had the formula interpretation wrong. $sumw1 is the sum of all $w1
values which are the weights of each word that comprises the compound
keyword, for example, if your keyword is "Hello World" then you have
$w1 for "Hello" and $w1 for "World".

Phil
Jul 17 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: JT | last post by:
I have a MySQL fulltext search form in place and I now want to filter the results further. I've added a few multiple select boxes on my form (ResourceType and Topic), and I'm able to build and...
6
by: James Turner | last post by:
I am trying to store formatted text (windows format) into a MySQL database and then retrieve it. The field in the database is a varchar. I cut and paste the test into a form field formatted, then...
2
by: Maziar Aflatoun | last post by:
Hi, I have the following table CREATE TABLE `fulltext_sample` ( `copy` text, FULLTEXT KEY `copy` (`copy`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1; INSERT INTO `fulltext_sample` VALUES...
0
by: Henry Hank | last post by:
Environment: I'm setting up a database server on a Dell Poweredge 2650, dual 1.8GHZ pentium with 1GB of memory and RAID5 drives. I've installed RedHat 9, and updated the kernel to 2.4.20-19.9smp....
0
by: Phil Powell | last post by:
Retracing my problem leads me to believe I never successfully created fulltext indexes for MySQL 3.23.58 MyISAM tables. I went to the MySQL manual and was able - or so I thought - to create them,...
9
by: chromis | last post by:
Hi, I've never coded in ASP before and I'm trying to port a couple of simple PHP files to ASP.NET. The first file addScores.php takes form data and a hash and inserts the data into the db, it...
1
by: chromis | last post by:
Hi, I wouldn't post such a lazy question usually but I've got to have this sorted for the end of the day, sorry! Could someone tell me how to write this statement in MSSQL please? ...
4
by: pavanip | last post by:
Hi, I am developing one application in that i have to calculate bank account credit scores.Please anyone give me the suggestion about how to calculate credit scores and how to verify bank account...
6
by: AshleyB | last post by:
2 phpbb boards I run have been hacked. One I've restored but the other one has had some damage to the MySQL db. A table is missing. As it's a styles table I thought I could could copy a table,...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.