By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,592 Members | 1,958 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,592 IT Pros & Developers. It's quick & easy.

MySQL and searching TEXT fields

P: n/a
I was wondering what the best solution is for making large numbers of
TEXT (or BLOB?) fields searchable. For example, if I have a forum,
what is the best way to be able to search for specific words? How
about exact phrases?

I saw a solution where all words are preindexed in a "dictionary" like
table and then another table stores the word matches. That seems
really fast, but it has two major problems: 1) it can't do exact
matches, and 2) the match table would get insanely big really fast.

What is the "standard" way of doing this...?
Jul 20 '05 #1
Share this Question
Share on Google+
4 Replies


P: n/a
Michi wrote:
I was wondering what the best solution is for making large numbers of
TEXT (or BLOB?) fields searchable. For example, if I have a forum,
what is the best way to be able to search for specific words? How
about exact phrases?


You should read MySQL's documentation on full-text searching, and see if
it solves your requirements:

http://dev.mysql.com/doc/mysql/en/Fulltext_Search.html

Bill K.
Jul 20 '05 #2

P: n/a
Bill Karwin <bi**@karwin.com> wrote in message news:<ck*********@enews2.newsguy.com>...
Michi wrote:
I was wondering what the best solution is for making large numbers of
TEXT (or BLOB?) fields searchable. For example, if I have a forum,
what is the best way to be able to search for specific words? How
about exact phrases?


You should read MySQL's documentation on full-text searching, and see if
it solves your requirements:

http://dev.mysql.com/doc/mysql/en/Fulltext_Search.html

Bill K.


There seems to be a large consensus that the searching is very slow.
When I said "large" I really meant in the few hundred thousand to
millions of records. Would you say this is really the best solution
for large scale applications?
Jul 20 '05 #3

P: n/a
Michi wrote:
There seems to be a large consensus that the searching is very slow.
When I said "large" I really meant in the few hundred thousand to
millions of records.
Yeah, I pretty much ignore when people say "large number" because it's
so subjective. A small retail shop calls it large database if it's
20MB, while the IRS might call it a large database if it's 200TB (maybe
not even then).
Would you say this is really the best solution
for large scale applications?


Maybe not. The "dictionary table" you mentioned seems more likely.

Yes, it will have a lot of rows, but they don't all have to store the
word being searched; you can store a table of words, and assign an
integer to each one. And an integer to each record with the text blob.
Then you have a words_in_blob table where each record is simply two
integers that are foreign keys to the other tables.

You could also populate that dictionary table only with words that are
actually chosen by users as search criteria, instead of every word that
occurs in the text. The first person to use a given word might take an
extra performance hit as their word is indexed and added to the lookup
table. But subsequent searches for that word will be very fast. Faster
even than if you had pre-filled the entire dictionary into the table,
since naturally the table and its index will be smaller.

Or you could try using a full-text index for a while, and log the most
frequently chosen search words. Then switch the implementation over to
the dictionary table method, and prepopulate it with words for which
people have actually searched.

There is usually a tradeoff between cost of storage and cost of
computation. Luckily, the cost of storage these days is very favorable.

Regards,
Bill K.
Jul 20 '05 #4

P: n/a
"Bill Karwin1" wrote:
Michi wrote:
There seems to be a large consensus that the searching is

very slow.
When I said "large" I really meant in the few hundred

thousand to
millions of records.


Yeah, I pretty much ignore when people say "large number"
because it's
so subjective. A small retail shop calls it large database if
it's
20MB, while the IRS might call it a large database if it's
200TB (maybe
not even then).
Would you say this is really the best solution
for large scale applications?


Maybe not. The "dictionary table" you mentioned seems more
likely.

Yes, it will have a lot of rows, but they don't all have to
store the
word being searched; you can store a table of words, and
assign an
integer to each one. And an integer to each record with the
text blob.
Then you have a words_in_blob table where each record is
simply two
integers that are foreign keys to the other tables.

You could also populate that dictionary table only with words
that are
actually chosen by users as search criteria, instead of every
word that
occurs in the text. The first person to use a given word
might take an
extra performance hit as their word is indexed and added to
the lookup
table. But subsequent searches for that word will be very
fast. Faster
even than if you had pre-filled the entire dictionary into the
table,
since naturally the table and its index will be smaller.

Or you could try using a full-text index for a while, and log
the most
frequently chosen search words. Then switch the
implementation over to
the dictionary table method, and prepopulate it with words for
which
people have actually searched.

There is usually a tradeoff between cost of storage and cost
of
computation. Luckily, the cost of storage these days is very
favorable.

Regards,
Bill K.


Mysql full text is slow for large tables, BUT the slowness in my
opinion is due to the large amount of time to retrieve the large
fulltext index file from disk to memory. So the easy solution (if you
can) is to simply force the fulltext index to sit in memory.
Building on that, if you can have all the indecis sit in memory at all
times, then your performance is great. In mysql, you can dedicate a
chunk of ram to index cache, and if that area is larger than the sum
of all your indecis, then you will run like a speeding bullet.

--
http://www.dbForumz.com/ This article was posted by author's request
Articles individually checked for conformance to usenet standards
Topic URL: http://www.dbForumz.com/mySQL-search...ict161795.html
Visit Topic URL to contact author (reg. req'd). Report abuse: http://www.dbForumz.com/eform.php?p=548290
Jul 20 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.