473,847 Members | 2,463 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

why does MATCH/AGAINST fail to catch entries that LIKE does catch?


Wierd. Go to this page:

http://www.ihanuman.com/search.php

and search for "yoga"

This query gets run:

SELECT * FROM albums WHERE MATCH(name,desc ription) AGAINST ('yoga')
ORDER BY id DESC

it returns nothing. (other searches work, but not the one for
"yoga").

But if I do SELECT * FROM albums WHERE description LIKE '%yoga%'

then I get 5 matches

What the hell??

Jun 19 '07 #1
6 3536
Rik
On Tue, 19 Jun 2007 09:08:18 +0200, lawrence k <lk******@geoci ties.com>
wrote:
>
Wierd. Go to this page:

http://www.ihanuman.com/search.php

and search for "yoga"

This query gets run:

SELECT * FROM albums WHERE MATCH(name,desc ription) AGAINST ('yoga')
ORDER BY id DESC

it returns nothing. (other searches work, but not the one for
"yoga").

But if I do SELECT * FROM albums WHERE description LIKE '%yoga%'

then I get 5 matches
This clearly a MySQL issue, NOT PHP.
However, I think this might shed some light, from the manual:

"The MySQL FULLTEXT implementation regards any sequence of true word
characters (letters, digits, and underscores) as a word. That sequence may
also contain apostrophes (?'?), but not more than one in a row. This means
that aaa'bbb is regarded as one word, but aaa''bbb is regarded as two
words. Apostrophes at the beginning or the end of a word are stripped by
the FULLTEXT parser; 'aaa'bbb' would be parsed as aaa'bbb.

The FULLTEXT parser determines where words start and end by looking for
certain delimiter characters; for example, ? ? (space), ?,? (comma), and
?.? (period). If words are not separated by delimiters (as in, for
example, Chinese), the FULLTEXT parser cannot determine where a word
begins or ends. To be able to add words or other indexed terms in such
languages to a FULLTEXT index, you must preprocess them so that they are
separated by some arbitrary delimiter such as ?"?.

Some words are ignored in full-text searches:

Any word that is too short is ignored. The default minimum length of words
that are found by full-text searches is four characters.

Words in the stopword list are ignored. A stopword is a word such as ?the?
or ?some? that is so common that it is considered to have zero semantic
value. There is a built-in stopword list, but it can be overwritten by a
user-defined list."

"Every correct word in the collection and in the query is weighted
according to its significance in the collection or query. Consequently, a
word that is present in many documents has a lower weight (and may even
have a zero weight), because it has lower semantic value in this
particular collection. Conversely, if the word is rare, it receives a
higher weight. The weights of the words are combined to compute the
relevance of the row.

Such a technique works best with large collections (in fact, it was
carefully tuned this way). For very small tables, word distribution does
not adequately reflect their semantic value, and this model may sometimes
produce bizarre results. For example, although the word ?MySQL? is present
in every row of the articles table shown earlier, a search for the word
produces no results:

mysqlSELECT * FROM articles
-WHERE MATCH (title,body) AGAINST ('MySQL');
Empty set (0.00 sec)

The search result is empty because the word ?MySQL? is present in at least
50% of the rows. As such, it is effectively treated as a stopword. For
large datasets, this is the most desirable behavior: A natural language
query should not return every second row from a 1GB table. For small
datasets, it may be less desirable."
So, in short: 'yoga' might not be found as a separate word, or be
considered to 'common' to match. For more details, ask a MySQL-group.

--
Rik Wasmus
Jun 19 '07 #2
lawrence k wrote:
>
Wierd. Go to this page:

http://www.ihanuman.com/search.php

and search for "yoga"

This query gets run:

SELECT * FROM albums WHERE MATCH(name,desc ription) AGAINST ('yoga')
ORDER BY id DESC

it returns nothing. (other searches work, but not the one for
"yoga").

But if I do SELECT * FROM albums WHERE description LIKE '%yoga%'

then I get 5 matches

What the hell??
Hi,

You are talking about MySQL I guess.
My knowledge of MySQL is limitted, but I think I can answer your question
based on TSEARCH2 as found in Postgresql.
Maybe it helps.

When you use 'match against' you are performing a full text search, which is
conceptually something different than LIKE '%yoga%'.

eg:
holayogatralala will match with: LIKE '%yoga%'
but not in match against.

FULL TEXT INDEXING is using whole words (or derivates) from a word.
This process is called 'stemming'.
eg: 'use' , 'used', and 'using' can all be stemmed to 'use'.
I *think* 'use' is called lexicon in this context, not 100% sure though..

Hope that helps.

Regards,
Erwin Moller
Jun 19 '07 #3
lawrence k <lk******@geoci ties.comwrote:
Wierd. Go to this page:
http://www.ihanuman.com/search.php
and search for "yoga"
This query gets run:
SELECT * FROM albums WHERE MATCH(name,desc ription) AGAINST ('yoga')
ORDER BY id DESC
it returns nothing. (other searches work, but not the one for
"yoga").
But if I do SELECT * FROM albums WHERE description LIKE '%yoga%'
then I get 5 matches
What the hell??
hello,

<cit>
The search result is empty because the word ['yoga'] is present in at
least 50% of the rows. As such, it is effectively treated as a stopword.
For large datasets, this is the most desirable behavior: A natural
language query should not return every second row from a 1GB table. For
small datasets, it may be less desirable.
</cit>

in : http://dev.mysql.com/doc/refman/5.0/...xt-search.html

--
@@@@@
E -00 comme on est very beaux dis !
' `) /
|\_ =="
Jun 19 '07 #4
On 19.06.2007 09:08 lawrence k wrote:
Wierd. Go to this page:

http://www.ihanuman.com/search.php

and search for "yoga"

This query gets run:

SELECT * FROM albums WHERE MATCH(name,desc ription) AGAINST ('yoga')
ORDER BY id DESC

it returns nothing. (other searches work, but not the one for
"yoga").

But if I do SELECT * FROM albums WHERE description LIKE '%yoga%'

then I get 5 matches

What the hell??
try adding "IN BOOLEAN MODE" to your query:
--
gosha bine

extended php parser ~ http://code.google.com/p/pihipi
blok ~ http://www.tagarga.com/blok
Jun 19 '07 #5
lawrence k wrote:
Wierd. Go to this page:

http://www.ihanuman.com/search.php

and search for "yoga"

This query gets run:

SELECT * FROM albums WHERE MATCH(name,desc ription) AGAINST ('yoga')
ORDER BY id DESC

it returns nothing. (other searches work, but not the one for
"yoga").

But if I do SELECT * FROM albums WHERE description LIKE '%yoga%'

then I get 5 matches

What the hell??
Try asking MySQL questions in a MySQL newsgroup - such as
comp.databases. mysql. That's where the MySQL experts hang out.

--
=============== ===
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attgl obal.net
=============== ===
Jun 19 '07 #6
On Jun 19, 4:34 am, Rik <luiheidsgoe... @hotmail.comwro te:
On Tue, 19 Jun 2007 09:08:18 +0200, lawrence k <lkrub...@geoci ties.com>
wrote:


Wierd. Go to this page:
http://www.ihanuman.com/search.php
and search for "yoga"
This query gets run:
SELECT * FROM albums WHERE MATCH(name,desc ription) AGAINST ('yoga')
ORDER BY id DESC
it returns nothing. (other searches work, but not the one for
"yoga").
But if I do SELECT * FROM albums WHERE description LIKE '%yoga%'
then I get 5 matches

This clearly a MySQL issue, NOT PHP.
However, I think this might shed some light, from the manual:

The search result is empty because the word ?MySQL? is present in at least
50% of the rows. As such, it is effectively treated as a stopword. For
large datasets, this is the most desirable behavior: A natural language
query should not return every second row from a 1GB table. For small
datasets, it may be less desirable."

So, in short: 'yoga' might not be found as a separate word, or be
considered to 'common' to match. For more details, ask a MySQL-group.
Thanks very much. That is very helpful to me.
Jun 19 '07 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
10363
by: Ron Adam | last post by:
Is it possible to match a string to regular expression pattern instead of the other way around? For example, instead of finding a match within a string, I want to find out, (pass or fail), if a string is a partial match to an re. Given an re of 'abcd and a bunch of other stuff' This is what i'm looking for:
6
2314
by: Duane Morin | last post by:
I've inherited an XSL transform that I need to squeeze every last millisecond out of (since it's running several hundred thousand times). I've noticed that there are 26 match clauses in the file. They are 13 pairs that each check the same condition, like this: <xsl:template match="A/foo"> .... <xsl:template match="B/foo"> .... <xsl:template match="A/bar">
5
2587
by: MC | last post by:
Hi If I have a pointer to a some structure say for example payroll_ptr where struct payroll { ... } has some members and if i use a function argument as int function_process ( (payroll_ptr) 0 , ..) does the first argument become a NULL pointer ?
13
5069
by: Jason Huang | last post by:
Hi, Would someone explain the following coding more detail for me? What's the ( ) for? CurrentText = (TextBox)e.Item.Cells.Controls; Thanks. Jason
16
4995
by: lawrence k | last post by:
I've a file upload script on my site. I just now used it to upload a small text document (10k). Everything worked fine. Then I tried to upload a 5.3 meg Quicktime video. Didn't work. I've set the POST limit in php.ini to 8 megs. What reasons, other than the POST limit, would a large upload fail?
9
2157
by: David Thielen | last post by:
Hi; I am sure I am missing something here but I cannot figure it out. Below I have a program and I cannot figure out why the xpath selects that throw an exception fail. From what I know they should work. Also the second nav.OuterXml appears to also be wrong to me. Can someone explain to me why this does not work? (This is an example from a program we have where xpath can be entered in two parts so we have to be able
1
1624
by: Chris L. | last post by:
Imagine a situation where a customer buys products and is awarded free articles if s/he buys predefined sets of products. For example if s/he buys twenty articles, among them a "cereal bowl", a "coffee mug" and a "orange juice bottle", then s/he bought a "breakfast set" and is awarded a free toaster. I need to store the "cereal", "coffe" and "orange" items on the "rules" table. (There will be other composite rules as well --buying...
19
2444
by: active | last post by:
The ColorPalette class has no constructor so how does one use it? I define a variable by: Dim cp as ColorPalette but don't know how assign an object to the variable. Thanks in advance
0
1060
by: nmaddock | last post by:
Hi Guys, seen this example on msdn, how can i remove the invalide nodes and elements when they fail against the schema and then save to a new xml document. any help would be great my code is below thanks nathan
0
9727
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10981
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10643
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10706
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9477
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7879
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5716
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
4521
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
4113
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.