Connecting Tech Pros Worldwide Forums | Help | Site Map

MySQL and searching TEXT fields

Michi
Guest
 
Posts: n/a
#1: Jul 20 '05
I was wondering what the best solution is for making large numbers of
TEXT (or BLOB?) fields searchable. For example, if I have a forum,
what is the best way to be able to search for specific words? How
about exact phrases?

I saw a solution where all words are preindexed in a "dictionary" like
table and then another table stores the word matches. That seems
really fast, but it has two major problems: 1) it can't do exact
matches, and 2) the match table would get insanely big really fast.

What is the "standard" way of doing this...?

Bill Karwin
Guest
 
Posts: n/a
#2: Jul 20 '05

re: MySQL and searching TEXT fields


Michi wrote:
[color=blue]
> I was wondering what the best solution is for making large numbers of
> TEXT (or BLOB?) fields searchable. For example, if I have a forum,
> what is the best way to be able to search for specific words? How
> about exact phrases?[/color]

You should read MySQL's documentation on full-text searching, and see if
it solves your requirements:

http://dev.mysql.com/doc/mysql/en/Fulltext_Search.html

Bill K.
Michi
Guest
 
Posts: n/a
#3: Jul 20 '05

re: MySQL and searching TEXT fields


Bill Karwin <bill@karwin.com> wrote in message news:<ckupid11bmi@enews2.newsguy.com>...[color=blue]
> Michi wrote:
>[color=green]
> > I was wondering what the best solution is for making large numbers of
> > TEXT (or BLOB?) fields searchable. For example, if I have a forum,
> > what is the best way to be able to search for specific words? How
> > about exact phrases?[/color]
>
> You should read MySQL's documentation on full-text searching, and see if
> it solves your requirements:
>
> http://dev.mysql.com/doc/mysql/en/Fulltext_Search.html
>
> Bill K.[/color]

There seems to be a large consensus that the searching is very slow.
When I said "large" I really meant in the few hundred thousand to
millions of records. Would you say this is really the best solution
for large scale applications?
Bill Karwin
Guest
 
Posts: n/a
#4: Jul 20 '05

re: MySQL and searching TEXT fields


Michi wrote:[color=blue]
> There seems to be a large consensus that the searching is very slow.
> When I said "large" I really meant in the few hundred thousand to
> millions of records.[/color]

Yeah, I pretty much ignore when people say "large number" because it's
so subjective. A small retail shop calls it large database if it's
20MB, while the IRS might call it a large database if it's 200TB (maybe
not even then).
[color=blue]
> Would you say this is really the best solution
> for large scale applications?[/color]

Maybe not. The "dictionary table" you mentioned seems more likely.

Yes, it will have a lot of rows, but they don't all have to store the
word being searched; you can store a table of words, and assign an
integer to each one. And an integer to each record with the text blob.
Then you have a words_in_blob table where each record is simply two
integers that are foreign keys to the other tables.

You could also populate that dictionary table only with words that are
actually chosen by users as search criteria, instead of every word that
occurs in the text. The first person to use a given word might take an
extra performance hit as their word is indexed and added to the lookup
table. But subsequent searches for that word will be very fast. Faster
even than if you had pre-filled the entire dictionary into the table,
since naturally the table and its index will be smaller.

Or you could try using a full-text index for a while, and log the most
frequently chosen search words. Then switch the implementation over to
the dictionary table method, and prepopulate it with words for which
people have actually searched.

There is usually a tradeoff between cost of storage and cost of
computation. Luckily, the cost of storage these days is very favorable.

Regards,
Bill K.
steve
Guest
 
Posts: n/a
#5: Jul 20 '05

re: MySQL and searching TEXT fields


"Bill Karwin1" wrote:[color=blue]
> Michi wrote:[color=green]
> > There seems to be a large consensus that the searching is[/color]
> very slow.[color=green]
> > When I said "large" I really meant in the few hundred[/color]
> thousand to[color=green]
> > millions of records.[/color]
>
> Yeah, I pretty much ignore when people say "large number"
> because it's
> so subjective. A small retail shop calls it large database if
> it's
> 20MB, while the IRS might call it a large database if it's
> 200TB (maybe
> not even then).
>[color=green]
> > Would you say this is really the best solution
> > for large scale applications?[/color]
>
> Maybe not. The "dictionary table" you mentioned seems more
> likely.
>
> Yes, it will have a lot of rows, but they don't all have to
> store the
> word being searched; you can store a table of words, and
> assign an
> integer to each one. And an integer to each record with the
> text blob.
> Then you have a words_in_blob table where each record is
> simply two
> integers that are foreign keys to the other tables.
>
> You could also populate that dictionary table only with words
> that are
> actually chosen by users as search criteria, instead of every
> word that
> occurs in the text. The first person to use a given word
> might take an
> extra performance hit as their word is indexed and added to
> the lookup
> table. But subsequent searches for that word will be very
> fast. Faster
> even than if you had pre-filled the entire dictionary into the
> table,
> since naturally the table and its index will be smaller.
>
> Or you could try using a full-text index for a while, and log
> the most
> frequently chosen search words. Then switch the
> implementation over to
> the dictionary table method, and prepopulate it with words for
> which
> people have actually searched.
>
> There is usually a tradeoff between cost of storage and cost
> of
> computation. Luckily, the cost of storage these days is very
> favorable.
>
> Regards,
> Bill K.[/color]

Mysql full text is slow for large tables, BUT the slowness in my
opinion is due to the large amount of time to retrieve the large
fulltext index file from disk to memory. So the easy solution (if you
can) is to simply force the fulltext index to sit in memory.
Building on that, if you can have all the indecis sit in memory at all
times, then your performance is great. In mysql, you can dedicate a
chunk of ram to index cache, and if that area is larger than the sum
of all your indecis, then you will run like a speeding bullet.

--
http://www.dbForumz.com/ This article was posted by author's request
Articles individually checked for conformance to usenet standards
Topic URL: http://www.dbForumz.com/mySQL-search...ict161795.html
Visit Topic URL to contact author (reg. req'd). Report abuse: http://www.dbForumz.com/eform.php?p=548290
Closed Thread


Similar MySQL Database bytes