By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,635 Members | 2,187 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,635 IT Pros & Developers. It's quick & easy.

Regular expression for cleaning html safely

P: n/a
Hi,

I'm building a web site that can render html from various user input.
The problem is that the html cannot be trusted, so I need to ensure it does
not contain script attack injection.
That's why I'd like to provide a set of allowed tag and to remove other
ones.

I think about regular expression. However, I was able to find some regex
samples that remove a set a untrusted tags (scripts, iframe, etc), but I'd
like to allow only a set of tag, because the regex can only remove "well
formed" tags : <scriptw/o </scriptwont't be removed.

So does anyone have a regex that remove any content between tags that are
not in a safe list ?
And if possible, is it possible to remove any attribute that can be
potentially dangerous ? (<span onload="javascript:attack(...)">)

Thanks in advance
Sep 4 '06 #1
Share this Question
Share on Google+
1 Reply


P: n/a
You may give www.regexlib.com a shot.

"Steve B." <st**********@com.msn_swap_msn_and_comwrote in message
news:%2****************@TK2MSFTNGP03.phx.gbl...
Hi,

I'm building a web site that can render html from various user input.
The problem is that the html cannot be trusted, so I need to ensure it does
not contain script attack injection.
That's why I'd like to provide a set of allowed tag and to remove other
ones.

I think about regular expression. However, I was able to find some regex
samples that remove a set a untrusted tags (scripts, iframe, etc), but I'd
like to allow only a set of tag, because the regex can only remove "well
formed" tags : <scriptw/o </scriptwont't be removed.

So does anyone have a regex that remove any content between tags that are
not in a safe list ?
And if possible, is it possible to remove any attribute that can be
potentially dangerous ? (<span onload="javascript:attack(...)">)

Thanks in advance
Sep 4 '06 #2

This discussion thread is closed

Replies have been disabled for this discussion.