Connecting Tech Pros Worldwide Help | Site Map

XML parsing and HTML comments

Colin McKinnon
Guest
 
Posts: n/a
#1: Sep 13 '05
Hi all,

I find myaelf writing a template system (yeah, I know - but there is a
reason I'm not using an existing one). So I'm trying to parse xhtml using
the builtin expat parser. Mostly it works fine, however it ignores anything
that looks like an HTML comment. This is a bit of a problem as I see a lot
of code written like:

<script type='text/javascript>
<!--
alert("hello world');
// -->
</script>
<style type="text/css">
<!--
..style1 {
font-family: Verdana, Arial, Helvetica, sans-serif;
font-size: 9px;
}
-->
</style>

Now obviously the browser is seeing the stuff inside '<!--' ...'-->' but
expat doesn't. I tried adding a non-parsed handler, but still can't see it.

Anybody fixed this?

C.
Andy Hassall
Guest
 
Posts: n/a
#2: Sep 14 '05

re: XML parsing and HTML comments


On Tue, 13 Sep 2005 13:27:18 +0100, Colin McKinnon
<colin.deletethis@andthis.mms3.com> wrote:
[color=blue]
>I find myaelf writing a template system (yeah, I know - but there is a
>reason I'm not using an existing one). So I'm trying to parse xhtml using
>the builtin expat parser. Mostly it works fine, however it ignores anything
>that looks like an HTML comment. This is a bit of a problem as I see a lot
>of code written like:
>
><script type='text/javascript>
><!--
> alert("hello world');
>// -->
></script>
><style type="text/css">
><!--
>.style1 {
> font-family: Verdana, Arial, Helvetica, sans-serif;
> font-size: 9px;
>}
>-->
></style>
>
>Now obviously the browser is seeing the stuff inside '<!--' ...'-->' but
>expat doesn't. I tried adding a non-parsed handler, but still can't see it.
>
>Anybody fixed this?[/color]

expat (the XML parser used in these functions) has support for adding comment
handlers, but that doesn't appear to be hooked into the PHP extension, so you
can't get at that functionality without patching the source of the extension.

If you look in the PHP source, under ext/xml/xml.c you see:

/* Short-term TODO list:
* - Implement XML_ExternalEntityParserCreate()
* - XML_SetCommentHandler
* - XML_SetCdataSectionHandler
* - XML_SetParamEntityParsing
*/

The second one being what you want.

--
Andy Hassall :: andy@andyh.co.uk :: http://www.andyh.co.uk
http://www.andyhsoftware.co.uk/space :: disk and FTP usage analysis tool
Colin McKinnon
Guest
 
Posts: n/a
#3: Sep 14 '05

re: XML parsing and HTML comments


Andy Hassall wrote:
[color=blue]
> On Tue, 13 Sep 2005 13:27:18 +0100, Colin McKinnon
> <colin.deletethis@andthis.mms3.com> wrote:
>[color=green]
>> Mostly it works fine, however it ignores
>>anything that looks like an HTML comment. This is a bit of a problem as I
>>see a lot of code written like:
>>
>><script type='text/javascript>
>><!--[/color]
>
> expat (the XML parser used in these functions) has support for adding
> comment
> handlers, but that doesn't appear to be hooked into the PHP extension, so
> you can't get at that functionality without patching the source of the
> extension.
>[/color]

erk.

Thanks Andy. At least I know I'm not doing something stupid.

For software I'm planning to release, patching the source isn't an ideal
solution. I managed to implement a workaround by running this on the XML
first:

$xml=str_replace('<!--', '<![CDATA[<!--', $xml);
$xml=str_replace('-->', '-->]]>', $xml);

(again not ideal, but hopefully less painful than recompiling/maintaining
expat)

C.
Closed Thread