By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,959 Members | 1,136 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,959 IT Pros & Developers. It's quick & easy.

Mix of Match() Against(), & Match() Against(in boolean mode) = Non-Sorted results

DTeCH
P: 23
I have a bit of a problem here...


The following code returns matches that the users searched for - nicely sorted by relevance, & super fast:
Expand|Select|Wrap|Line Numbers
  1. SELECT
  2.     *
  3. FROM
  4.     search_table
  5. WHERE
  6.     MATCH (Title_Column) AGAINST (
  7.         'Whatever the user searched for'
  8.     )
  9. LIMIT 25;


The problem shows up when i do the following to narrow the results by category:
Expand|Select|Wrap|Line Numbers
  1. SELECT
  2.     *
  3. FROM
  4.     search_table
  5. WHERE
  6.     MATCH (Title_Column) AGAINST (
  7.         'Whatever the user searched for'
  8.     )
  9. AND (
  10.     MATCH (Category_Column) AGAINST ('Music' IN BOOLEAN MODE)
  11. )
  12. LIMIT 25;
The results are by no means sorted, & they are not related to what the user searched for even though they put quite a few different words in the query. I also added In Boolen MODE to the second part that filters by the category because I'm not sure if it will try to order by category instead.

I am by no means a database guru, or even a noob... I'm 10 feet below that.



I have tried the following, but it completely defeats the benefits of having an index. Although it does the job, it's performance is poor compared to the above examples:
Expand|Select|Wrap|Line Numbers
  1. SELECT
  2.     *, MATCH (Title_Column) AGAINST (
  3.         'Michael Jackson' IN Boolean MODE
  4.     ) AS Relevance
  5. FROM
  6.     search_table
  7. WHERE
  8.     (
  9.         MATCH (Title_Column) AGAINST (
  10.             'Michael Jackson' IN Boolean MODE
  11.         )
  12.         AND MATCH (Category_Column) AGAINST ('Music' IN Boolean MODE)
  13.     )
  14. ORDER BY
  15.     Relevance DESC
  16. LIMIT 25;

I was 3 seconds away from deciding to re-learn some other SQL language because this is as far as i can get - speed being the requirement, & it's not usable.

Here are the requirements:
  1. SPEED
  2. User selects 1 of 23 categories. Category 1 is "ALL CATEGORIES". A few others are a mix of categories - for example: "Music" on the search page category list is actually a combination of "Music-MP3", "Music-FLACK", "Music-WMA", & "Music-Video"... all separate categories in the Category_Column's column that's assigned to each relevant record in the table.
  3. Search Title_Column for user's search term, & filer it by the categories selected.
  4. Results MUST be sorted by relevance, AND return results according the the user's search term.
  5. Did I mention Speed?



Ok... here's the CREATE statement:
Expand|Select|Wrap|Line Numbers
  1. delimiter $$
  2.  
  3. CREATE TABLE `search_table` (
  4.   `Category_Column` tinytext NOT NULL,
  5.   `s_Size` varchar(12) NOT NULL,
  6.   `Title_Column` tinytext NOT NULL,
  7.   `User_TITLE_Column` tinytext NOT NULL,
  8.   `s_PostDate` varchar(40) NOT NULL,
  9.   `s_Poster` tinytext NOT NULL,
  10.   `s_ItemHASH` varchar(40) NOT NULL,
  11.   `s_ItemName` tinytext NOT NULL,
  12.   `s_ItemNumber` bigint(9) NOT NULL,
  13.   `s_DownloadsCount` bigint(9) NOT NULL,
  14.   `s_URL` tinytext NOT NULL,
  15.   `tid` bigint(9) NOT NULL auto_increment,
  16.   `s_CatImage` varchar(30) NOT NULL,
  17.   `s_UnixTimeStamp` bigint(10) NOT NULL,
  18.   `s_ID` varchar(40) NOT NULL,
  19.   PRIMARY KEY  (`tid`),
  20.   FULLTEXT KEY `iCategory_Column` (`Category_Column`),  
  21.   FULLTEXT KEY `iTitle_Column` (`Title_Column`),   
  22.   FULLTEXT KEY `iUserTITLE_Column` (`User_TITLE_Column`),   
  23.   FULLTEXT KEY `i_Poster` (`s_Poster`)   
  24. ) ENGINE=MyISAM AUTO_INCREMENT=737319 DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC$$

FULLTEXT indexes are of B-Tree type


I would like to see examples of the proper syntax that allows the user to search, & return relevant results just as the fisrt (FAST) example I gave above does, but with the category filter.

I would like it to be as fast as the first search example, or even the second (that returns non-relevant results), but it MUST continue to sort the relevant results just as it does without the filter.


Any examples would be awesomely appreciated.


MySQL 5.0.XX

The table has just under 1,000,000 records... 879,366 records to be exact.

Columns are all less than 255 characters.


Thanks
Jun 27 '12 #1
Share this Question
Share on Google+
2 Replies


Expert 100+
P: 1,031
First of all, when you do a 'SELECT * FROM table' there is NO way of knowing in which order your data is returned. If you want it to be returned in a specific order you have to use the 'ORDER BY'-clause.

Your 'Category_Column' is wrongly defined. If you say 'Category 1 is "ALL CATEGORIES". ', i will ask you, why you did not define that field as INT(2) ??

How will you show 'ALL CATEGORIES', if there are records with the category 'Music', which clearly is a diffent category than 'ALL CATEGORIES' ?

Selecting an index on a INT(2) field is quicker than selecting on a FULL TEXT index.

If you want to combine categories in your final result you can always do someting like:
SELECT * from table where Category IN (1,3,5)
to select the category one, three and five.
Jul 1 '12 #2

DTeCH
P: 23
The 'ORDER BY' function drastically slows the response time, but I agree.

The 'Category_Column' is different from the site's category dropdown box. If the user chooses 'All Categories' from the site's dropdown box, then I don't filter... The resulting query is plain (and fast) like the following:

Expand|Select|Wrap|Line Numbers
  1.     SELECT
  2.         *
  3.     FROM
  4.         search_table
  5.     WHERE
  6.         MATCH (Title_Column) AGAINST (
  7.             'Whatever the user searched for'
  8.         )
  9.     LIMIT 25;
  10.  


I also agree on Int(2), but for now, we're still building, & am trying to avoid confusion, & possibilities for mistakes, so I leave them visually understandable. When it's all done, & acceptable for the production server, I'll change it. I need every possible second shaved from the response times.

This will be accessed from public/private software, & sites (many), & if the response times of each query is slow, then that will lead to a site lockup, crash, or lag/freeze.

Thanks for your response :)
Jul 15 '12 #3

Post your reply

Sign in to post your reply or Sign up for a free account.