Ch********@gmai l.com wrote:
I face some problem that I want to filter the all words
in HTML.
Before Filter:
<div id="pp"hello man <br/Thank's for your answer.
</div>
After Filter:
<div id="pp"<br/</div>
Forget regexes. As the saying goes, 'You cannot parse HTML
with regexes'. There's also no reason to write your own
HTML parser -- there already are more than enough of those.
XSLT was meant exactly for this type of processing, and it
doesn't really care what you're processing, as long as it's
a DOMDocument.
Using PHP5's DOM and XSL modules:
<?php
$xml_str =
'<div id="pp"><phell o man <br/Thank\'s for your ' .
'answer. </div>' ;
$xsl_str =
'<xsl:styleshee t ' .
' xmlns:xsl="http ://www.w3.org/1999/XSL/Transform" ' .
' version="1.0">' .
' <xsl:template match="node()|@ *">' .
' <xsl:copy>' .
' <xsl:apply-templates select="node()| @*"/>' .
' </xsl:copy>' .
' </xsl:template>' .
' <xsl:template match="html">' .
' <xsl:apply-templates/>' .
' </xsl:template>' .
' <xsl:template match="body">' .
' <result>' .
' <xsl:apply-templates/>' .
' </result>' .
' </xsl:template>' .
' <xsl:template match="text()"/>' .
' </xsl:stylesheet> ' ;
$xml = DOMDocument :: loadHTML ( $xml_str ) ;
$xsl = DOMDocument :: loadXML ( $xsl_str ) ;
$xform = new XSLTProcessor ( ) ;
$xform -importStyleshee t ( $xsl ) ;
$result = $xform -transformToDoc ( $xml ) ;
header ( 'Content-type: text/xml' ) ;
print ( $result -saveXML ( ) ) ;
?>
If you're using real XHTML (as opposed to mumbo jumbo tag
soup pretending to be XHTML), it's even better, because you
don't have to pretend you're processing XML. XHTML *is*
XML.
--
Pavel Lepin