By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
459,978 Members | 1,831 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 459,978 IT Pros & Developers. It's quick & easy.

About a class I wrote to filter bad html input

P: n/a
Please look at my newly written class. It is meant to be used to filter
suspicious html input from an online html editor. I need help about 2
1. Does it need to filter more things? Which I think is of course
needed although I don't know where to improve.
2. You see I try to filter any link. If the target address is not
started with "http://" or "mailto:", it will be replaced with an empty
string. But I think the code I wrote can be rewritten to make it more
performant. But how?
Public Class strOp
Public Function filterHtml(ByVal s As String)
s = Regex.Replace(s,
"<script>|</script>|<iframe.*?><!--#include.*?>", "",
s = Regex.Replace(s, "<.*? (?:onload|onclick|ondblclick)[
]?=[ ]?.*?>", "", RegexOptions.IgnoreCase)
Dim re As New Regex("<a .*?href\s*=\s*[""]?([^""
>]*)[""]?.*?>", RegexOptions.IgnoreCase Or RegexOptions.Singleline)
Dim m As Match
Dim s1, s2 As String
Dim ms As MatchCollection
ms = re.Matches(s)
For Each m In ms
s1 = m.Value.ToLower.ToString
s2 = re.Replace(s1, "$1")
If Not (s2.StartsWith("mailto:") Or
s2.StartsWith("http://")) Then
s = s.Replace(s1, "<a href=''>")
End If
Return s
End Function
End Class

Sep 26 '06 #1
Share this Question
Share on Google+
1 Reply

P: n/a
a little bit update:
I changed the 4th line:
s = Regex.Replace(s, "<.*? (?:onload|onclick|ondblclick)[
]?=[ ]?.*?>", "", RegexOptions.IgnoreCase)
s = Regex.Replace(s, "<.*?\s*(?:on)[a-z]*\s*=\s*.*?>", "",
so that it can match all dhtml events.

Sep 26 '06 #2

This discussion thread is closed

Replies have been disabled for this discussion.