473,398 Members | 2,525 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,398 software developers and data experts.

Looking for a search engine that search a mysql database

I have been using phpdig in some websites but now I stored a lot of larger
texts into a mysql database. In the phpdig search engine, when you entered a
search word, the page where the search word was found was displayed with
about 2 lines before and 2 lines behind the search word itself. Let us say
you look for "peanut butter" an the word is found in a larger text about
sandwiches, even when it is on the 40th line of the text you would get
something like
"www.mysite.com/sandwich.php
....In Holland peanut butter is a popular spread on sandwiches "
A query like "SELECT title, maintext FROM MYTEXTS WHERE maintext LIKE
$searchedword" will do most of the job and I can create a query that
displays only the first 200 characters of maintext, so there will be an
introductory text about sandwiches and our peanut butter lover maybe will
skip this page :)

but I am puzzled about a command that ( either in php or mysql) jumps to
$searchedword in the maintext field and returns a couple of lines around it.
Any ideas? If there is an open source php module that could do this I will
be happy too and maybe I just am overseeing a relatively easy function that
will do the job.. Google-ing to "mysql php search engines" did not give too
many hints.

Thanks for any help.

Thanks
May 10 '06 #1
5 2481
Read in http://dev.mysql.com/doc/refman/5.0/...xt-search.html
Full text search is mostly used. About the 200 characters I am not
sure.
Also, about the highlighting etc., you could look at MediaWiki's source
( http://www.mediawiki.org/ ).

Thank You.

May 10 '06 #2

"Drakazz" <vy****************@googlemail.com> schreef in bericht
news:11*********************@g10g2000cwb.googlegro ups.com...
Read in http://dev.mysql.com/doc/refman/5.0/...xt-search.html
Full text search is mostly used.

Thank you Drakazz.. as I said I thought maybe I am just entering the wrong
keywords in google , I didn't think about keywords like fulltext search but
as an afterthought it makes sense to me . I think this article will resolve
a lot of the problem

Martien
May 10 '06 #3
Rik
Drakazz wrote:
Full text search is mostly used. About the 200 characters I am not
sure.


No idea, but two methods come to mind:
Assuming $text is the returned text from the database, and $string is the
searchword:

Normal functions:

$occurance = stripos($text, $string);
$start = ($occurance-100 < 0) ? 0: $occurance-100;
$display = substr($start, 200 + strlen($text));

Advantage is it's quick, disadvantage is will only find the first occurance,
and will cut up words.

A probably more versatile method are regular expressions:

$chars = 100; // (the desired characters before and after)
$allowword = 20; //extra characters allowed to find a word boundary

$allow = $chars + $allowword;
$else = $chars-1; //pff, naming variables is a drag

$search = preg_quote($string, '/'); //escape all characters that could have
special meaning:

preg_match_all('/(^(?:.){0,'.$else.'}|\b(?:.){'.$chars.','.$allow.' })('.$sea
rch.')((?:.){'.$chars.','.$allow.'}\b|(?:.){0,'.$e lse.'}$)/si', $text,
$matches, PREG_SET_ORDER);

Now you have an array $matches, that contains the searchstring and
surrounding $chars characters. The expressions tries to keep words whole,
with a maximum of extra characters given bij $allowword. It's no problem
when there aren't that many characters in front or behind the searchstring,
in that case the matchs just returns from the beginning or untill the end
respectively.

$matches is now an array, containg:
$matches[index_of_match][0] = The entire text.
$matches[index_of_match][1] = The preceeding text.
$matches[index_of_match][2] = The searchstring.
$matches[index_of_match][3] = The proceeding text (? don't know wether this
is good english)

Matches can be diplayed like:
foreach($matches as $match){
print $match[0];
}

But maybe you want to highlight your searchstring, no problem:

foreach($matches as $match){
print $match[1].'<span
class="highlight">'.$match[2].'</span>'.$match[3];
}

When looking for several words, you could even change the search string like
this:

$searcharray = array('searchstring','some other word', 'yet another');
$search = implode('|',array_map('preg_quote', $searcharray));

And just apply the same regex. Note that will give back a match for each
word seperately. How to prevent those "double" matches is a whole other
ballgame. Coming here I realize that even searching for one term could give
you doubles.

Highlighting the other searchterms can't be done using just the matches
array. While keeping the double entries, every searchterm can be highlighted
like:

foreach($matches as $match){
print preg_replace('/('.$search.')/si', '<span
class="highlight">\1</span>', $match[0]);
}

Doubles could be prevented by using PREG_OFFSET_CAPTURE in the folowwing
regex:

$searcharray = array('searchstring','some other string', 'yet another');
$search = implode('|',array_map('preg_quote', $searcharray));
preg_match_all('/'.$search.'/si',$text, $matches,PREG_OFFSET_CAPTURE);

And then looping through $matches[0], gathering the surrounding text with
preg_matches on substrings (makes it a lot quicker), and checking wether or
not the offset of the following match is "within reach".

Create a substring from the text from searchterms close to eachother, with
max allowed characters +1 on either side.

pregmatch('/(\b{'.$chars.','.$allow.'}|^.{'./*exact number of preceeding
chars*/'}).{'./*exact_length from first offset to last offset plus
stringlength last searchterm*/.'}(.{'.$chars.','.$allow.'}\b|.{'./*exact
number of proceeding chars*/'}$)/si', $substring, $combinations.
PREG_SET_ORDER);

foreach($combinations as $final){
print preg_replace('/('.$search.')/si', '<span
class="highlight">\1</span>', $final[0]);
}
Grtz,

--
Rik Wasmus
May 10 '06 #4

"Rik" <lu************@hotmail.com> schreef in bericht
news:e3**********@netlx020.civ.utwente.nl...
[ detailed explanation}
Thank you Rik, for a "luiheidsgoeroe" you did a lot of work to resolve my
problem :) This is the solution I was looking for.

Martien
May 10 '06 #5
Rik
Martien van Wanrooij wrote:
Thank you Rik, for a "luiheidsgoeroe" you did a lot of work to
resolve my problem :) This is the solution I was looking for.

No problem.
Not living up to the name indeed... it's about time to rectify that...

Grtz,
--
Rik Wasmus
May 11 '06 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: michela rossi | last post by:
Hi, I currently have a website (running on RedHat Linux) comprising a large number of HTML files and PDF's. I currently use PHPMySearch (http://phpmysearch.web4.hm/index.php) which is a simple...
3
by: Zaphod Beeblebrox | last post by:
As much of this question relates to mysql, it may be OT? I'm trying to make a search engine for a reasonably complex database that was originally developed by someone else in Access. I've ported...
8
by: Rod | last post by:
Hi, i am doing a ecommerce website and would like to implement a search engine to find products. All the serach engine I have found on the web are parsing html page! This is not what i want. i...
14
by: Matt | last post by:
Any progammers looking for a killer app to develop? How about a voice enabled forum? One of the most powerful, exciting, and engrossing experiences on the Internet is the Forum. The first great...
2
by: correo | last post by:
Hello, I've built a search engine that queries a MySQL database. However, if I enter "foo bar", the engine will search for that phrase exactly, and will not find "bar foo" nor "foo something...
2
by: sathyashrayan | last post by:
dear group, I have been working VC++ for some time. My company assigned me a task for an online dictionary search site similar to the onelook.com which I have to make it in php mysql. Since I...
3
by: hazly | last post by:
I'm very new in the web technology and need advice on search engine. I want to develop a portal using PHP and MySQL on Linux. Need to know on the following features : 1. search engine that could...
19
by: bb nicole | last post by:
Below is my search engine for job portal which jobseeker can find the job through quick search. But it cant work... Is it mysql query got problem?? Thanx.. Interface <html> <head> <title>UMS...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.