Relevancy scores are normally defined by a MySQL query on a table that
has a fulltext index. The rules for relevancy scoring will exclude
certain words due to their being too short (minimum default is 4
letters).
This is the Fed. Everything is a TLA (three-letter acronym).
Therefore, since I'm building a PORTABLE web application, changing
MySQL's default settings for fulltext index querying is completely
undoable and unrealistic, so I created a "fake fulltext query", that
is, a plain query that masquerades as if it were a fulltext index
(only a lot slower but there is nothing I can do about it).
However, the client wants the exact same relevancy score as would be
found in a fulltext query. So I wrote a function that should
calculate the relevancy score based upon information provided by MySQL
AB's Sergei Golubchik. Here it is:
[PHP]
/**
* Calculate the relevancy score of a search query if the "SELECT
MATCH () AGAINST ()" fulltext indexing query is unavailable
*
* @access public
* @param object $result Resultset
* @param object $row Row of resultset to calculate relevancy
* @param mixed $keyword
* @param mixed $tableName name of table to perform counting query
to calculate $rows value
* @return float $score
* @see DBConnection
* @see MySQLQuery
* @see http://groups.google.com/groups?q=mysql+fulltext+maximum+score&hl=en&lr=&se lm=9tt6u1%24sca%241%40FreeBSD.csie.NCTU.edu.tw&rnu m=1
*/
function calculate_relevancy($result, $row, $keyword, $tableName) {
/*-------------------------------------------------------------------------------------------------------------------------
Explanation of integer variables:
1) $w1 = intermediate weight
2) $dcount = number of times word is present in document (or query
row)
3) $sumw1 = sum of all values of $w1 found
4) $uniq = number of unique words in the document (or query row)
5) $rows = total number of rows in the table
6) $found = number of rows in the table that contain the word in
question (@sizeof($result))
7) $qcount = number of times this word is present in the query
-------------------------------------------------------------------------------------------------------------------------*/
list($w1, $dcount, $sumw1, $uniq, $rows, $found, $qcount) = array(0,
0, 0, 0, 0, 0, 0);
$isSetUniq = false;
for ($i = 0; $i < @sizeof($result); $i++) {
foreach (@array_keys(get_object_vars($row)) as $field) {
$dcount += @substr_count($result[$i]->$field, $keyword);
if (!$isSetUniq) $uniq +=
@sizeof(array_flip(str_word_count($row->$field), 1));
}
$w1 = log((float)$dcount) + 1;
$sumw1Array[$i] += $w1;
$qcount += $sumw1Array[$i];
$dcount = 0;
$isSetUniq = true;
}
$found = @sizeof($result);
if (!isset($rows)) {
static $rows = 0;
global $dbHost, $dbPort, $dbUser, $dbPwd, $dbDefaultName;
$dbConnObj =& new DBConnection($dbHost, $dbPort, $dbUser, $dbPwd,
$dbDefaultName);
$dbConn = $dbConnObj->connect();
$query =& new MySQLQuery('SELECT count(r.*) AS rows FROM ' .
$this->getTempNameName() . ' r LIMIT 1', $dbConn);
$tempTableRowCountResult = $query->getResult();
$query = null;
$dbConnObj->close();
$dbConn = null;
$dbConnObj = null;
$rows = $tempTableRowCountResult[0]->rows;
}
$score = (float)($w1 / $sumw1 * $uniq / (1 + 0.0115 * $uniq) *
log((float)(($rows - $found) / $found)) * $qcount);
return $score;
}
[/PHP]
This completely fails because it produces NAN due to $sumw1 always
equaling INF (infinity). I am not math-savvy enough to know how to
fix this problem so I appeal to higher intelligence for help on this.
Thanx
Phil 2 3970
Phil Powell wrote: Relevancy scores are normally defined by a MySQL query on a table that has a fulltext index. The rules for relevancy scoring will exclude certain words due to their being too short (minimum default is 4 letters).
This is the Fed. Everything is a TLA (three-letter acronym).
Therefore, since I'm building a PORTABLE web application, changing MySQL's default settings for fulltext index querying is completely undoable and unrealistic, so I created a "fake fulltext query", that is, a plain query that masquerades as if it were a fulltext index (only a lot slower but there is nothing I can do about it).
However, the client wants the exact same relevancy score as would be found in a fulltext query. So I wrote a function that should calculate the relevancy score based upon information provided by MySQL AB's Sergei Golubchik. Here it is:
[PHP]
<snip code>
$score = (float)($w1 / $sumw1 * $uniq / (1 + 0.0115 * $uniq) * log((float)(($rows - $found) / $found)) * $qcount);
try replacing the $sumw1 with:
array_sum($sumw1Array)
--
Justin Koivisto - sp**@koivi.com
http://www.koivi.com
Justin Koivisto <sp**@koivi.com> wrote in message news:<VY****************@news7.onvoy.net>... Phil Powell wrote:
Relevancy scores are normally defined by a MySQL query on a table that has a fulltext index. The rules for relevancy scoring will exclude certain words due to their being too short (minimum default is 4 letters).
This is the Fed. Everything is a TLA (three-letter acronym).
Therefore, since I'm building a PORTABLE web application, changing MySQL's default settings for fulltext index querying is completely undoable and unrealistic, so I created a "fake fulltext query", that is, a plain query that masquerades as if it were a fulltext index (only a lot slower but there is nothing I can do about it).
However, the client wants the exact same relevancy score as would be found in a fulltext query. So I wrote a function that should calculate the relevancy score based upon information provided by MySQL AB's Sergei Golubchik. Here it is:
[PHP]
<snip code>
$score = (float)($w1 / $sumw1 * $uniq / (1 + 0.0115 * $uniq) * log((float)(($rows - $found) / $found)) * $qcount);
try replacing the $sumw1 with:
array_sum($sumw1Array)
I had the formula interpretation wrong. $sumw1 is the sum of all $w1
values which are the weights of each word that comprises the compound
keyword, for example, if your keyword is "Hello World" then you have
$w1 for "Hello" and $w1 for "World".
Phil This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: JT |
last post by:
I have a MySQL fulltext search form in place and I now want to filter the
results further. I've added a few multiple select boxes on my form
(ResourceType and Topic), and I'm able to build and...
|
by: James Turner |
last post by:
I am trying to store formatted text (windows format) into a MySQL
database and then retrieve it. The field in the database is a
varchar. I cut and paste the test into a form field formatted, then...
|
by: Maziar Aflatoun |
last post by:
Hi,
I have the following table
CREATE TABLE `fulltext_sample` (
`copy` text,
FULLTEXT KEY `copy` (`copy`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `fulltext_sample` VALUES...
|
by: Henry Hank |
last post by:
Environment:
I'm setting up a database server on a Dell Poweredge 2650, dual 1.8GHZ pentium
with 1GB of memory and RAID5 drives. I've installed RedHat 9, and updated the
kernel to 2.4.20-19.9smp....
|
by: Phil Powell |
last post by:
Retracing my problem leads me to believe I never successfully created
fulltext indexes for MySQL 3.23.58 MyISAM tables. I went to the MySQL
manual and was able - or so I thought - to create them,...
|
by: chromis |
last post by:
Hi,
I've never coded in ASP before and I'm trying to port a couple of simple PHP files to ASP.NET. The first file addScores.php takes form data and a hash and inserts the data into the db, it...
|
by: chromis |
last post by:
Hi,
I wouldn't post such a lazy question usually but I've got to have this sorted for the end of the day, sorry! Could someone tell me how to write this statement in MSSQL please?
...
|
by: pavanip |
last post by:
Hi,
I am developing one application in that i have to calculate bank account credit scores.Please anyone give me the suggestion about how to calculate credit scores and how to verify bank account...
|
by: AshleyB |
last post by:
2 phpbb boards I run have been hacked. One I've restored but the other one has had some damage to the MySQL db. A table is missing.
As it's a styles table I thought I could could copy a table,...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 7 Feb 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:30 (7.30PM).
In this month's session, the creator of the excellent VBE...
|
by: MeoLessi9 |
last post by:
I have VirtualBox installed on Windows 11 and now I would like to install Kali on a virtual machine. However, on the official website, I see two options: "Installer images" and "Virtual machines"....
|
by: DolphinDB |
last post by:
Tired of spending countless mintues downsampling your data? Look no further!
In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
|
by: Aftab Ahmad |
last post by:
So, I have written a code for a cmd called "Send WhatsApp Message" to open and send WhatsApp messaage. The code is given below.
Dim IE As Object
Set IE =...
|
by: ryjfgjl |
last post by:
ExcelToDatabase: batch import excel into database automatically...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: Vimpel783 |
last post by:
Hello!
Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
|
by: PapaRatzi |
last post by:
Hello,
I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
| |