"Nik Coughin" <nr***********@woosh.co.nz> wrote in message
news:GZ*******************@news.xtra.co.nz...
Looking for a function that sanitises a string, ie removes any javascript,
frames, iframes (have I missed anything? any other dangerous html that
should be stripped?) and also prevents SQL attacks. If I have to I'll
just do a little research and write it myself, but always nice not to have to
reinvent the wheel. Something nice and simple, like $str = sanitise(
$str ); would be ideal.
HTML is notoriously difficult to sanitize. Javascript can appear in a number
of different places: between <script> tags, linked in by a <link> tag,
onXXXX handlers, href and src attributes, CSS declarations, and possibly
others. You also have to worry about <object> and <embed>. The rarely used
<base> tag can totally screw with your relative links. A <style> tag can
make everything disappear ("body {display:none}"). Even inline style is
dangerous, since it allows someone to position an element anywhere on the
page--e.g. a fake tool bar that cover up the real one.
It's also very tricky to write regexps that look for these tags. Internet
Explorer will ignore char(0), for example. "<s\0cript..." will be
interpreted as "<script...". And then there's second-order attacks to watch
for, where the attack code is formed after an offending tag is removed (e.g.
"<scr<script> dummie = 0; </script>ipt> ... ").
There are two reasonable approaches to this problem:
A. Don't allow HTML. Pass everything through htmlspecialchars() before
echoing it.
B. Look for tags that you do allow, replace them with placeholders (e.g. <b>
=> [[[b]]]), strip off all other tags, and change the placeholders back to
tags.