By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,480 Members | 1,146 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,480 IT Pros & Developers. It's quick & easy.

Good search theory

P: n/a
Hello,

I'm a webmaster for a college newspaper and I'm implementing an article
search. I'm running PHP with a MySQL database to store the weekly
stories. Does anyone know of an article that could offer good search
theory.

My top priority right now is multiple search terms and relevance
sorting based on how many word hits are returned.

It's easy to search for a single word or term in a body of text. I can
just use the MySQL "WHERE `body` LIKE 'term'" query. But what about
searching for two terms, or searching for the most relevant document
based on how many hits of the term are found.

I imagine I would split up the search query and run multiple "LIKE
'term'" queries to find multiple hits. I would have to pick some
arbitrary number of searches because searching each article 50 times is
not an option.

Seems like there are a lot of choices in how to set up a good search
system and I'd like to get started on the right foot to reduce my work
load.

-Aaron

Jul 17 '05 #1
Share this Question
Share on Google+
5 Replies


P: n/a
AaronV wrote:
Hello,

I'm a webmaster for a college newspaper and I'm implementing an article
search. I'm running PHP with a MySQL database to store the weekly
stories. Does anyone know of an article that could offer good search
theory.

My top priority right now is multiple search terms and relevance
sorting based on how many word hits are returned.

It's easy to search for a single word or term in a body of text. I can
just use the MySQL "WHERE `body` LIKE 'term'" query. But what about
searching for two terms, or searching for the most relevant document
based on how many hits of the term are found.

I imagine I would split up the search query and run multiple "LIKE
'term'" queries to find multiple hits. I would have to pick some
arbitrary number of searches because searching each article 50 times is
not an option.

Seems like there are a lot of choices in how to set up a good search
system and I'd like to get started on the right foot to reduce my work
load.

-Aaron

You could look at fulltext searches.

http://dev.mysql.com/doc/mysql/en/fulltext-search.html

Look especially at the MATCH bits to get the relevance of the result.

Sacs
Jul 17 '05 #2

P: n/a
Since your search will be done on a body of text, I would suggest using
MySQL's fulltext search. It is more efficient and accurate than using
simple LIKE queries. Fulltext searches will also allow you to
determine the relevancy of the results. All the searches that I've
done over the years haven't ever worked "exactly" right, but fulltext
is as close as I've ever gotten. Below are some links that hopefully
will point you in the right direction.

http://dev.mysql.com/doc/mysql/en/fulltext-search.html
http://www.sitepoint.com/forums/arch.../t-174265.html
http://dev.mysql.com/doc/mysql/en/Regexp.html

Mike

Jul 17 '05 #3

P: n/a
"AaronV" <aa**************@gmail.com> wrote in message
news:11**********************@l41g2000cwc.googlegr oups.com...
Hello,

I'm a webmaster for a college newspaper and I'm implementing an article
search. I'm running PHP with a MySQL database to store the weekly
stories. Does anyone know of an article that could offer good search
theory.

My top priority right now is multiple search terms and relevance
sorting based on how many word hits are returned.

It's easy to search for a single word or term in a body of text. I can
just use the MySQL "WHERE `body` LIKE 'term'" query. But what about
searching for two terms, or searching for the most relevant document
based on how many hits of the term are found.

I imagine I would split up the search query and run multiple "LIKE
'term'" queries to find multiple hits. I would have to pick some
arbitrary number of searches because searching each article 50 times is
not an option.

Seems like there are a lot of choices in how to set up a good search
system and I'd like to get started on the right foot to reduce my work
load.

-Aaron


Just let Google do it.
Jul 17 '05 #4

P: n/a
In: <11**********************@l41g2000cwc.googlegroups .com>, "AaronV" <aa**************@gmail.com> wrote:
Hello,

I'm a webmaster for a college newspaper and I'm implementing an article
search. I'm running PHP with a MySQL database to store the weekly
stories. Does anyone know of an article that could offer good search
theory.
If it's an option for you, have a look at swish-e

http://swish-e.org/index.html

I don't know if there is a PHP interface or not though. It's semi-difficult to
set up, but the folks who wrote it really did a good job. There are all kinds
of ways of setting up Swish-e for META tags and the like.

Proximity and phrases are quite difficult, tricky stuff but swish-e handles
them.

If swish-e won't work another option might be Lucene:

http://lucene.apache.org/java/docs/

Been a few years, but when I checked into it Lucene was quite good as well.
It's java, which may be an issue if you're not already running servlets.
Surprisingly fast, especially considering it's java.

Another option is Ht://dig

http://htdig.org/

Last I checked, it didn't do phrase matching, but it's quite mature. Been
around a long time, several people are using it. It's the easiest one I've
seen where setup is concerned. If you don't require phrase match, it's pretty
decent.

All of them that I've listed use an index and are pretty good at scale.
Wouldn't try to use them in place of teoma.com, (With the possible exception of
multiple Lucene's) but I bet they would work well for your application.

One could probably fill a small library (or at least a full section of a
library) with books on the subject of searching full text. 'tis not an easy
task.
Seems like there are a lot of choices in how to set up a good search
system and I'd like to get started on the right foot to reduce my work
load.


Maybe I'm prejudiced, but in my opinion SQL databases are not really designed
for searching full text. (Been awhile, but I've been burned by them for
fulltext search in the past) I suppose for a few hundred articles and/or
highly custom search tools, an SQL database would work. (If your articles are
in XML, then such a database would be OK for searching in titles or maybe within
pre-determined XML containers like <var>..</var>)

The "issue" I take with them is that you are effectively using a database
AS an index. A database's primary goal is (or should be) data storage. Fulltext
indices are a different beast altogether.

They are excellent for setting up prototype "proof of concept" but quickly
break down when using them for larger quantities of data. (This opinion based
on a context-aware search tool, done in 1999, 6 years is a long time and things
may have changed.)

They do make good URL storage devices, last index time, things like that.

Jamie
--
http://www.geniegate.com Custom web programming
gu******@lnubb.pbz (rot13) User Management Solutions
Jul 17 '05 #5

P: n/a
Thanks for the many solutions everyone. I'll start with Fulltext
because it will take the least effort to get something rudimentary
working in short order. I'll examine the other options listed as well.

Jul 17 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.