469,933 Members | 2,015 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,933 developers. It's quick & easy.

About a class I wrote to filter bad html input

Please look at my newly written class. It is meant to be used to filter
suspicious html input from an online html editor. I need help about 2
1. Does it need to filter more things? Which I think is of course
needed although I don't know where to improve.
2. You see I try to filter any link. If the target address is not
started with "http://" or "mailto:", it will be replaced with an empty
string. But I think the code I wrote can be rewritten to make it more
performant. But how?
Public Class strOp
Public Function filterHtml(ByVal s As String)
s = Regex.Replace(s,
"<script>|</script>|<iframe.*?><!--#include.*?>", "",
s = Regex.Replace(s, "<.*? (?:onload|onclick|ondblclick)[
]?=[ ]?.*?>", "", RegexOptions.IgnoreCase)
Dim re As New Regex("<a .*?href\s*=\s*[""]?([^""
>]*)[""]?.*?>", RegexOptions.IgnoreCase Or RegexOptions.Singleline)
Dim m As Match
Dim s1, s2 As String
Dim ms As MatchCollection
ms = re.Matches(s)
For Each m In ms
s1 = m.Value.ToLower.ToString
s2 = re.Replace(s1, "$1")
If Not (s2.StartsWith("mailto:") Or
s2.StartsWith("http://")) Then
s = s.Replace(s1, "<a href=''>")
End If
Return s
End Function
End Class

Sep 26 '06 #1
1 1447
a little bit update:
I changed the 4th line:
s = Regex.Replace(s, "<.*? (?:onload|onclick|ondblclick)[
]?=[ ]?.*?>", "", RegexOptions.IgnoreCase)
s = Regex.Replace(s, "<.*?\s*(?:on)[a-z]*\s*=\s*.*?>", "",
so that it can match all dhtml events.

Sep 26 '06 #2

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

4 posts views Thread by Nicolae Fieraru | last post: by
2 posts views Thread by pittendrigh | last post: by
1 post views Thread by =?iso-8859-1?Q?David_S=E1nchez_Mart=EDn?= | last post: by
3 posts views Thread by Michel Esber | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.