Connecting Tech Pros Worldwide Help | Site Map

Searching text

 
LinkBack Thread Tools Search this Thread
  #1  
Old July 17th, 2005, 09:20 AM
SK
Guest
 
Posts: n/a
Default Searching text

Is there a way to store HTML into a MySQL TEXT column,
yet be able to search over textual content only?

For example, if I store the following HTML snippet:

<p>A very <em>small</em> paragraph.</p>

.... into an TEXT column, I would like to be able to search
in the text "A very small paragraph." only, meaning the search
would find "small" (even if it's enclosed in <em> tags and
thus creating string "<em>small</em>"), and ignore search
strings "<p>" and "<em>".




  #2  
Old July 17th, 2005, 09:21 AM
kingofkolt
Guest
 
Posts: n/a
Default Re: Searching text

"SK" <sk@foo.com> wrote in message news:2uci4hF27hbs0U1@uni-berlin.de...[color=blue]
> Is there a way to store HTML into a MySQL TEXT column,
> yet be able to search over textual content only?
>
> For example, if I store the following HTML snippet:
>
> <p>A very <em>small</em> paragraph.</p>
>
> ... into an TEXT column, I would like to be able to search
> in the text "A very small paragraph." only, meaning the search
> would find "small" (even if it's enclosed in <em> tags and
> thus creating string "<em>small</em>"), and ignore search
> strings "<p>" and "<em>".
>
>[/color]

Try strip_tags():
http://us2.php.net/manual/en/function.strip-tags.php


  #3  
Old July 17th, 2005, 09:21 AM
Chung Leong
Guest
 
Posts: n/a
Default Re: Searching text

"SK" <sk@foo.com> wrote in message news:2uci4hF27hbs0U1@uni-berlin.de...[color=blue]
> Is there a way to store HTML into a MySQL TEXT column,
> yet be able to search over textual content only?
>
> For example, if I store the following HTML snippet:
>
> <p>A very <em>small</em> paragraph.</p>
>
> ... into an TEXT column, I would like to be able to search
> in the text "A very small paragraph." only, meaning the search
> would find "small" (even if it's enclosed in <em> tags and
> thus creating string "<em>small</em>"), and ignore search
> strings "<p>" and "<em>".
>
>[/color]

One possible way to do this would be to strip out the tags before you save
the text and store the formatting info separately. Restore the tags only
when you need to output in HTML.

Example:

<?

$example = <<<EXAMPLE
<p>A very <em>small</em> paragraph.</p>

.... into an TEXT column, I would like to be able to search
in the text "A very small paragraph." only, meaning the search
would find "small" (even if it's enclosed in <em> tags and
thus creating string "<em>small</em>"), and ignore search
strings "<p>" and "<em>".
EXAMPLE;

function ExtractFormatting($html) {
preg_match_all('/<.*?>/', $html, $matches,
PREG_OFFSET_CAPTURE |PREG_PATTERN_ORDER);
$tags = $matches[0];
$formatting = array();
foreach($tags as $tag) {
list($tag_html, $tag_offset) = $tag;
$formatting[$tag_offset] = $tag_html;
}
$text = preg_replace('/<.*?>/', '', $html);
return array($text, $formatting);
}

function ApplyFormatting($text, $formatting) {
$html = $text;
foreach($formatting as $tag_offset => $tag_html) {
$html = substr($html, 0, $tag_offset)
. $tag_html
. substr($html, $tag_offset);
}
return $html;
}

list($text, $formatting) = ExtractFormatting($example);

echo "<pre style='border:1px solid black'>$text</pre>";

$html = ApplyFormatting($text, $formatting);

echo "<div style='border:1px solid black'>$html</div>";

?>

To store the formatting info, just serialize the array and dump it into
another text column.


 

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Popular Articles

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over 220,989 network members.