"Seth Russell" <ru**********@gmail.com> writes:
I don't know what it is, but it probably doesn't work in my browser
anyway ... checking ... well, at least I can write HTML in it.
Not in my version of it, i suppressed the "look at html" check box.
Did the wysiwyg not work in your browser? Which browser is that?
Opera. It doesn't have formatted text input functionality. I don't
know if any browser except IE and Mozilla-based ones have such a
proprietary feature.
Yes, yes ... care to point me to a routine in php that does that.
Needs to
* disallow all scripts
* disallow broken html - this is going out on a atom \ Rss feed and
needs to be perfect XHTML
I'd go the safer way and choose what to allow, not what to deny.
Any text formatting tags should be retained (b, i, u, em, strong,
br, perhaps even p). No attributes should be allowed (no event
handlers or style attributes[1], and the rest doesn't really matter
then). If any of these elements are not closed, it's not a big deal,
but you could count starts and ends add missing ends.
So in Javascript, I would do something like:
---
// list of allowed tagnames
var allowed = ['b','i','u','em','strong','br'];
// RegExp matching tag
var tagRE = /(.*?)(<(/?)(\w+)\b[^>]*>|$)/g;
// RegExp matching alloweed
var validRE = new RegExp("^("+allowed.join("|")+")$");
// replace all non-allowed tags and make sure all allowed tags are closed
function sanitize(html) {
// stack of open tags
var open = [];
// foreach tag, replace with ...
return html.replace(tagRE, function(_, before, tag, end, name) {
// escape < and & in non-tag text.
before = before.replace(/&/g,"&").replace(/</g,"<")
if (name) { // contains a tag - not end of string
if (validRE.test(name)) { // allowed tag
if (!end) { // allowed start tag
open.push(name);
return before+"<"+name+">";
} else { // allowed end tag
var result = [before];
var top;
while (top = open.pop()) {
result.push("</",top,">")
if (top == name) { break; }
}
return result.join("");
}
} else { // unallowed tags.
return before;
}
} else { // end of string
result = [before];
while(open.length > 0) {
result.pop("</",open.pop(),">");
}
return result.join("");
}
});
}
---
I.e., pick out tags and in-between text, escape all "<" and "&" in text,
remove all unallowed tags, remove all attributes from allowed tags,
and close all open tags correctly (remove incorrect closing tags).
While this might not give exactly what an author intended for some
invalid HTML, he really has only himself to blame :)
I have no idea how to convert this to PHP, but a competent PHP'er will
probably know how.
/L
[1] Yes, style elements can be dangerous too (works in, at least, IE):
<b style="background-image:
url(javascript
:document.location.href='http://mysexsite.example.com/')">
--
Lasse Reichstein Nielsen -
lr*@hotpop.com
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
'Faith without judgement merely degrades the spirit divine.'