473,320 Members | 2,145 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

why does MATCH/AGAINST fail to catch entries that LIKE does catch?


Wierd. Go to this page:

http://www.ihanuman.com/search.php

and search for "yoga"

This query gets run:

SELECT * FROM albums WHERE MATCH(name,description) AGAINST ('yoga')
ORDER BY id DESC

it returns nothing. (other searches work, but not the one for
"yoga").

But if I do SELECT * FROM albums WHERE description LIKE '%yoga%'

then I get 5 matches

What the hell??

Jun 19 '07 #1
6 3505
Rik
On Tue, 19 Jun 2007 09:08:18 +0200, lawrence k <lk******@geocities.com>
wrote:
>
Wierd. Go to this page:

http://www.ihanuman.com/search.php

and search for "yoga"

This query gets run:

SELECT * FROM albums WHERE MATCH(name,description) AGAINST ('yoga')
ORDER BY id DESC

it returns nothing. (other searches work, but not the one for
"yoga").

But if I do SELECT * FROM albums WHERE description LIKE '%yoga%'

then I get 5 matches
This clearly a MySQL issue, NOT PHP.
However, I think this might shed some light, from the manual:

"The MySQL FULLTEXT implementation regards any sequence of true word
characters (letters, digits, and underscores) as a word. That sequence may
also contain apostrophes (?'?), but not more than one in a row. This means
that aaa'bbb is regarded as one word, but aaa''bbb is regarded as two
words. Apostrophes at the beginning or the end of a word are stripped by
the FULLTEXT parser; 'aaa'bbb' would be parsed as aaa'bbb.

The FULLTEXT parser determines where words start and end by looking for
certain delimiter characters; for example, ? ? (space), ?,? (comma), and
?.? (period). If words are not separated by delimiters (as in, for
example, Chinese), the FULLTEXT parser cannot determine where a word
begins or ends. To be able to add words or other indexed terms in such
languages to a FULLTEXT index, you must preprocess them so that they are
separated by some arbitrary delimiter such as ?"?.

Some words are ignored in full-text searches:

Any word that is too short is ignored. The default minimum length of words
that are found by full-text searches is four characters.

Words in the stopword list are ignored. A stopword is a word such as ?the?
or ?some? that is so common that it is considered to have zero semantic
value. There is a built-in stopword list, but it can be overwritten by a
user-defined list."

"Every correct word in the collection and in the query is weighted
according to its significance in the collection or query. Consequently, a
word that is present in many documents has a lower weight (and may even
have a zero weight), because it has lower semantic value in this
particular collection. Conversely, if the word is rare, it receives a
higher weight. The weights of the words are combined to compute the
relevance of the row.

Such a technique works best with large collections (in fact, it was
carefully tuned this way). For very small tables, word distribution does
not adequately reflect their semantic value, and this model may sometimes
produce bizarre results. For example, although the word ?MySQL? is present
in every row of the articles table shown earlier, a search for the word
produces no results:

mysqlSELECT * FROM articles
-WHERE MATCH (title,body) AGAINST ('MySQL');
Empty set (0.00 sec)

The search result is empty because the word ?MySQL? is present in at least
50% of the rows. As such, it is effectively treated as a stopword. For
large datasets, this is the most desirable behavior: A natural language
query should not return every second row from a 1GB table. For small
datasets, it may be less desirable."
So, in short: 'yoga' might not be found as a separate word, or be
considered to 'common' to match. For more details, ask a MySQL-group.

--
Rik Wasmus
Jun 19 '07 #2
lawrence k wrote:
>
Wierd. Go to this page:

http://www.ihanuman.com/search.php

and search for "yoga"

This query gets run:

SELECT * FROM albums WHERE MATCH(name,description) AGAINST ('yoga')
ORDER BY id DESC

it returns nothing. (other searches work, but not the one for
"yoga").

But if I do SELECT * FROM albums WHERE description LIKE '%yoga%'

then I get 5 matches

What the hell??
Hi,

You are talking about MySQL I guess.
My knowledge of MySQL is limitted, but I think I can answer your question
based on TSEARCH2 as found in Postgresql.
Maybe it helps.

When you use 'match against' you are performing a full text search, which is
conceptually something different than LIKE '%yoga%'.

eg:
holayogatralala will match with: LIKE '%yoga%'
but not in match against.

FULL TEXT INDEXING is using whole words (or derivates) from a word.
This process is called 'stemming'.
eg: 'use' , 'used', and 'using' can all be stemmed to 'use'.
I *think* 'use' is called lexicon in this context, not 100% sure though..

Hope that helps.

Regards,
Erwin Moller
Jun 19 '07 #3
lawrence k <lk******@geocities.comwrote:
Wierd. Go to this page:
http://www.ihanuman.com/search.php
and search for "yoga"
This query gets run:
SELECT * FROM albums WHERE MATCH(name,description) AGAINST ('yoga')
ORDER BY id DESC
it returns nothing. (other searches work, but not the one for
"yoga").
But if I do SELECT * FROM albums WHERE description LIKE '%yoga%'
then I get 5 matches
What the hell??
hello,

<cit>
The search result is empty because the word ['yoga'] is present in at
least 50% of the rows. As such, it is effectively treated as a stopword.
For large datasets, this is the most desirable behavior: A natural
language query should not return every second row from a 1GB table. For
small datasets, it may be less desirable.
</cit>

in : http://dev.mysql.com/doc/refman/5.0/...xt-search.html

--
@@@@@
E -00 comme on est very beaux dis !
' `) /
|\_ =="
Jun 19 '07 #4
On 19.06.2007 09:08 lawrence k wrote:
Wierd. Go to this page:

http://www.ihanuman.com/search.php

and search for "yoga"

This query gets run:

SELECT * FROM albums WHERE MATCH(name,description) AGAINST ('yoga')
ORDER BY id DESC

it returns nothing. (other searches work, but not the one for
"yoga").

But if I do SELECT * FROM albums WHERE description LIKE '%yoga%'

then I get 5 matches

What the hell??
try adding "IN BOOLEAN MODE" to your query:
--
gosha bine

extended php parser ~ http://code.google.com/p/pihipi
blok ~ http://www.tagarga.com/blok
Jun 19 '07 #5
lawrence k wrote:
Wierd. Go to this page:

http://www.ihanuman.com/search.php

and search for "yoga"

This query gets run:

SELECT * FROM albums WHERE MATCH(name,description) AGAINST ('yoga')
ORDER BY id DESC

it returns nothing. (other searches work, but not the one for
"yoga").

But if I do SELECT * FROM albums WHERE description LIKE '%yoga%'

then I get 5 matches

What the hell??
Try asking MySQL questions in a MySQL newsgroup - such as
comp.databases.mysql. That's where the MySQL experts hang out.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================
Jun 19 '07 #6
On Jun 19, 4:34 am, Rik <luiheidsgoe...@hotmail.comwrote:
On Tue, 19 Jun 2007 09:08:18 +0200, lawrence k <lkrub...@geocities.com>
wrote:


Wierd. Go to this page:
http://www.ihanuman.com/search.php
and search for "yoga"
This query gets run:
SELECT * FROM albums WHERE MATCH(name,description) AGAINST ('yoga')
ORDER BY id DESC
it returns nothing. (other searches work, but not the one for
"yoga").
But if I do SELECT * FROM albums WHERE description LIKE '%yoga%'
then I get 5 matches

This clearly a MySQL issue, NOT PHP.
However, I think this might shed some light, from the manual:

The search result is empty because the word ?MySQL? is present in at least
50% of the rows. As such, it is effectively treated as a stopword. For
large datasets, this is the most desirable behavior: A natural language
query should not return every second row from a 1GB table. For small
datasets, it may be less desirable."

So, in short: 'yoga' might not be found as a separate word, or be
considered to 'common' to match. For more details, ask a MySQL-group.
Thanks very much. That is very helpful to me.
Jun 19 '07 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: Ron Adam | last post by:
Is it possible to match a string to regular expression pattern instead of the other way around? For example, instead of finding a match within a string, I want to find out, (pass or fail), if...
6
by: Duane Morin | last post by:
I've inherited an XSL transform that I need to squeeze every last millisecond out of (since it's running several hundred thousand times). I've noticed that there are 26 match clauses in the file....
5
by: MC | last post by:
Hi If I have a pointer to a some structure say for example payroll_ptr where struct payroll { ... } has some members and if i use a function argument as int function_process ( (payroll_ptr)...
13
by: Jason Huang | last post by:
Hi, Would someone explain the following coding more detail for me? What's the ( ) for? CurrentText = (TextBox)e.Item.Cells.Controls; Thanks. Jason
16
by: lawrence k | last post by:
I've a file upload script on my site. I just now used it to upload a small text document (10k). Everything worked fine. Then I tried to upload a 5.3 meg Quicktime video. Didn't work. I've...
9
by: David Thielen | last post by:
Hi; I am sure I am missing something here but I cannot figure it out. Below I have a program and I cannot figure out why the xpath selects that throw an exception fail. From what I know they...
1
by: Chris L. | last post by:
Imagine a situation where a customer buys products and is awarded free articles if s/he buys predefined sets of products. For example if s/he buys twenty articles, among them a "cereal bowl", a...
19
by: active | last post by:
The ColorPalette class has no constructor so how does one use it? I define a variable by: Dim cp as ColorPalette but don't know how assign an object to the variable. Thanks in advance
0
by: nmaddock | last post by:
Hi Guys, seen this example on msdn, how can i remove the invalide nodes and elements when they fail against the schema and then save to a new xml document. any help would be great my code is...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.