"SK" <sk@foo.com> wrote in message news:2u*************@uni-berlin.de...
Is there a way to store HTML into a MySQL TEXT column,
yet be able to search over textual content only?
For example, if I store the following HTML snippet:
<p>A very <em>small</em> paragraph.</p>
... into an TEXT column, I would like to be able to search
in the text "A very small paragraph." only, meaning the search
would find "small" (even if it's enclosed in <em> tags and
thus creating string "<em>small</em>"), and ignore search
strings "<p>" and "<em>".
One possible way to do this would be to strip out the tags before you save
the text and store the formatting info separately. Restore the tags only
when you need to output in HTML.
Example:
<?
$example = <<<EXAMPLE
<p>A very <em>small</em> paragraph.</p>
.... into an TEXT column, I would like to be able to search
in the text "A very small paragraph." only, meaning the search
would find "small" (even if it's enclosed in <em> tags and
thus creating string "<em>small</em>"), and ignore search
strings "<p>" and "<em>".
EXAMPLE;
function ExtractFormatting($html) {
preg_match_all('/<.*?>/', $html, $matches,
PREG_OFFSET_CAPTURE |PREG_PATTERN_ORDER);
$tags = $matches[0];
$formatting = array();
foreach($tags as $tag) {
list($tag_html, $tag_offset) = $tag;
$formatting[$tag_offset] = $tag_html;
}
$text = preg_replace('/<.*?>/', '', $html);
return array($text, $formatting);
}
function ApplyFormatting($text, $formatting) {
$html = $text;
foreach($formatting as $tag_offset => $tag_html) {
$html = substr($html, 0, $tag_offset)
. $tag_html
. substr($html, $tag_offset);
}
return $html;
}
list($text, $formatting) = ExtractFormatting($example);
echo "<pre style='border:1px solid black'>$text</pre>";
$html = ApplyFormatting($text, $formatting);
echo "<div style='border:1px solid black'>$html</div>";
?>
To store the formatting info, just serialize the array and dump it into
another text column.