Jane Doe <jane.doe@acme.com> writes:
[color=blue]
> On 09 Sep 2003 14:03:34 +0200, Lasse Reichstein Nielsen
> <lrn@hotpop.com> wrote:[color=green]
> > document.body.innerHTML =
> > document.body.innerHTML.replace(new RegExp(items.join("|"),""),"IGNORED");[/color]
>
> Thx a bunch Lasse for the prompt answer :-) It looks like a much
> better solution, although I'll still have to find out the following:
>
> 1. innerHTML only seems to work in IE. Doesn't work with Opera 5 and
> might not work with Netscape[/color]
It works in IE 4+, Opera 7 and Mozilla. Perhas a few other recent
browsers. Any older browsers are out.
On the other hand, Netscape 4 and Opera 6 will not allow you to change
the contents of the page at all, after it is loaded. So there is no
method that works there.
If you can ignore IE 4, I would prefer to use DOM methods, traversing
the DOM tree and changing the text in the text nodes.
[color=blue]
>
> 2. Only the first occurence of the pattern is replace, ie. if I have
> (John|Jane), and those items both appear in the page, only the first
> occurence is replaced (the second is ignored). I assume I need to add
> /g somewhere to tell JS to search & replace _all_ occurences[/color]
Doh. Yes, the place to add the "g" is in the second argument to RegExp
(currently an empty string, make it "g", and perhaps even "gi").
Also notice that you match even inside words, so Johnson becomes
IGNOREDson. You can fix that, by making the regular expression
new RegExp("\\b("+items.join("|")+")\\b","gi");
The "\b" matches the boundary between a word character and a non-word
character, so it won't match after "John" in "Johnson".
[color=blue]
> 3. I'm actually parsing rows in a table, so need to construct a more
> complicated search pattern than the one I gave to get started.[/color]
It is sometimes easier to split the problem into more than one regular
expression. E.g., one to find a table row, another to test whether
it contains the forbidden words. You can alway combine them, they might
just be horribly much bigger.
[color=blue]
> function clean() {[/color]
Ok. If we only aim at newer browsers, try this:
function clean() {
var body = document.body.innerHTML;
var itemRE = new RegExp("\\b("+items.join("|")+")\\b","gi");
body = body.replace(/<tr(.|\s)*?<\/tr>/gi,function(row) {
if (row.match(itemRE)) {
return "";
} else {
return row;
}
});
document.body.innerHTML = body;
}
it replace each table row (from "<tr" to "</tr>") with either
itself or the empty string, depending on whether the row
contains the words in the "items" array.
/L
--
Lasse Reichstein Nielsen -
lrn@hotpop.com
Art D'HTML: <URL:http://www.infimum.dk/HTML/randomArtSplit.html>
'Faith without judgement merely degrades the spirit divine.'