Connecting Tech Pros Worldwide Help | Site Map

Searching text

SK
Guest
 
Posts: n/a
#1: Jul 17 '05
Is there a way to store HTML into a MySQL TEXT column,
yet be able to search over textual content only?

For example, if I store the following HTML snippet:

<p>A very <em>small</em> paragraph.</p>

.... into an TEXT column, I would like to be able to search
in the text "A very small paragraph." only, meaning the search
would find "small" (even if it's enclosed in <em> tags and
thus creating string "<em>small</em>"), and ignore search
strings "<p>" and "<em>".



kingofkolt
Guest
 
Posts: n/a
#2: Jul 17 '05

re: Searching text


"SK" <sk@foo.com> wrote in message news:2uci4hF27hbs0U1@uni-berlin.de...[color=blue]
> Is there a way to store HTML into a MySQL TEXT column,
> yet be able to search over textual content only?
>
> For example, if I store the following HTML snippet:
>
> <p>A very <em>small</em> paragraph.</p>
>
> ... into an TEXT column, I would like to be able to search
> in the text "A very small paragraph." only, meaning the search
> would find "small" (even if it's enclosed in <em> tags and
> thus creating string "<em>small</em>"), and ignore search
> strings "<p>" and "<em>".
>
>[/color]

Try strip_tags():
http://us2.php.net/manual/en/function.strip-tags.php


Chung Leong
Guest
 
Posts: n/a
#3: Jul 17 '05

re: Searching text


"SK" <sk@foo.com> wrote in message news:2uci4hF27hbs0U1@uni-berlin.de...[color=blue]
> Is there a way to store HTML into a MySQL TEXT column,
> yet be able to search over textual content only?
>
> For example, if I store the following HTML snippet:
>
> <p>A very <em>small</em> paragraph.</p>
>
> ... into an TEXT column, I would like to be able to search
> in the text "A very small paragraph." only, meaning the search
> would find "small" (even if it's enclosed in <em> tags and
> thus creating string "<em>small</em>"), and ignore search
> strings "<p>" and "<em>".
>
>[/color]

One possible way to do this would be to strip out the tags before you save
the text and store the formatting info separately. Restore the tags only
when you need to output in HTML.

Example:

<?

$example = <<<EXAMPLE
<p>A very <em>small</em> paragraph.</p>

.... into an TEXT column, I would like to be able to search
in the text "A very small paragraph." only, meaning the search
would find "small" (even if it's enclosed in <em> tags and
thus creating string "<em>small</em>"), and ignore search
strings "<p>" and "<em>".
EXAMPLE;

function ExtractFormatting($html) {
preg_match_all('/<.*?>/', $html, $matches,
PREG_OFFSET_CAPTURE |PREG_PATTERN_ORDER);
$tags = $matches[0];
$formatting = array();
foreach($tags as $tag) {
list($tag_html, $tag_offset) = $tag;
$formatting[$tag_offset] = $tag_html;
}
$text = preg_replace('/<.*?>/', '', $html);
return array($text, $formatting);
}

function ApplyFormatting($text, $formatting) {
$html = $text;
foreach($formatting as $tag_offset => $tag_html) {
$html = substr($html, 0, $tag_offset)
. $tag_html
. substr($html, $tag_offset);
}
return $html;
}

list($text, $formatting) = ExtractFormatting($example);

echo "<pre style='border:1px solid black'>$text</pre>";

$html = ApplyFormatting($text, $formatting);

echo "<div style='border:1px solid black'>$html</div>";

?>

To store the formatting info, just serialize the array and dump it into
another text column.


Closed Thread