By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,220 Members | 1,633 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,220 IT Pros & Developers. It's quick & easy.

Search/replace patterns in web pages?

P: n/a
Hi,

I need to search and replace patterns in web pages, but I
can't find a way even after reading the ad hoc chapter in New Rider's
"Inside JavaScript".

Here's what I want to do:

function filter() {
var items = new Array("John", "Jane");

for (x = 0; x < items.length; x++) {
//Doesn't work
pattern = '/' + items[x] + '/';
//Doesn't work either
document.body = document.body.replace(pattern,"IGNORED");
}

ie., create an array of items to look for in the BODY section of the
page, and if any item exists, replace the item with IGNORED.

Anyone knows how to do this?

Thank you
JD.
Jul 20 '05 #1
Share this Question
Share on Google+
4 Replies


P: n/a
Jane Doe <ja******@acme.com> writes:
Hi,

I need to search and replace patterns in web pages, but I
can't find a way even after reading the ad hoc chapter in New Rider's
"Inside JavaScript".

Here's what I want to do:

function filter() {
var items = new Array("John", "Jane");

for (x = 0; x < items.length; x++) {
//Doesn't work
pattern = '/' + items[x] + '/';
This builds a string. (Make pattern a local variable with the "var" operator,
no need to have it global).
//Doesn't work either
document.body = document.body.replace(pattern,"IGNORED");


The object document.body is a DOM Node, not a text string.
What you can do, in some browsers, is to work on
document.body.innerHTML.

Also, change "pattern" to "new RegExp(items[x],'')" in this line. Then
you have created a regular expression with the name as content.

There is no need to run through all the items on at a time.
You can replace the entire for loop with

document.body.innerHTML =
document.body.innerHTML.replace(new RegExp(items.join("|"),""),"IGNORED");

(This way, the regualr expression becomes "John|Jane". Since you replace
them with the same string, you can just match them at the same time.

/L
--
Lasse Reichstein Nielsen - lr*@hotpop.com
Art D'HTML: <URL:http://www.infimum.dk/HTML/randomArtSplit.html>
'Faith without judgement merely degrades the spirit divine.'
Jul 20 '05 #2

P: n/a
On 09 Sep 2003 14:03:34 +0200, Lasse Reichstein Nielsen
<lr*@hotpop.com> wrote:
document.body.innerHTML =
document.body.innerHTML.replace(new RegExp(items.join("|"),""),"IGNORED");


Thx a bunch Lasse for the prompt answer :-) It looks like a much
better solution, although I'll still have to find out the following:

1. innerHTML only seems to work in IE. Doesn't work with Opera 5 and
might not work with Netscape

2. Only the first occurence of the pattern is replace, ie. if I have
(John|Jane), and those items both appear in the page, only the first
occurence is replaced (the second is ignored). I assume I need to add
/g somewhere to tell JS to search & replace _all_ occurences

3. I'm actually parsing rows in a table, so need to construct a more
complicated search pattern than the one I gave to get started. The
goal is to replace any row that contains any of the items into an
empty row (ie.
<tr><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>).

FWIW, here's what I'd like to do:

---------
function clean() {
var items = new Array("John", "Jane");
document.body.innerHTML = document.body.innerHTML.replace(new
RegExp(items.join("|"),""),"IGNORED");
}

[...]

<body onload='clean()()'>

<table>
<tr>
<td bgcolor="#FFFFFF" ><a
href="forum.php?forum=myforum&m=123">Title</a></td>
<td bgcolor="#FFFFFF">John</td>
<td bgcolor="#FFFFFF">10</td>
<td bgcolor="#FFFFFF">Posted 13 sept</td>
</tr>
<tr>
<td bgcolor="#FFFFFF" ><a
href="forum.php?forum=myforum&m=124">Title</a></td>
<td bgcolor="#FFFFFF">Jane</td>
<td bgcolor="#FFFFFF">2</td>
<td bgcolor="#FFFFFF">Posted 12 sept</td>
</tr>
</table>

---------

If you have any idea or sample code on the Net swhere, I'm interested
:-)

Thx again for your help
JD.
Jul 20 '05 #3

P: n/a
Jane Doe <ja******@acme.com> writes:
On 09 Sep 2003 14:03:34 +0200, Lasse Reichstein Nielsen
<lr*@hotpop.com> wrote:
document.body.innerHTML =
document.body.innerHTML.replace(new RegExp(items.join("|"),""),"IGNORED");
Thx a bunch Lasse for the prompt answer :-) It looks like a much
better solution, although I'll still have to find out the following:

1. innerHTML only seems to work in IE. Doesn't work with Opera 5 and
might not work with Netscape


It works in IE 4+, Opera 7 and Mozilla. Perhas a few other recent
browsers. Any older browsers are out.

On the other hand, Netscape 4 and Opera 6 will not allow you to change
the contents of the page at all, after it is loaded. So there is no
method that works there.

If you can ignore IE 4, I would prefer to use DOM methods, traversing
the DOM tree and changing the text in the text nodes.

2. Only the first occurence of the pattern is replace, ie. if I have
(John|Jane), and those items both appear in the page, only the first
occurence is replaced (the second is ignored). I assume I need to add
/g somewhere to tell JS to search & replace _all_ occurences
Doh. Yes, the place to add the "g" is in the second argument to RegExp
(currently an empty string, make it "g", and perhaps even "gi").
Also notice that you match even inside words, so Johnson becomes
IGNOREDson. You can fix that, by making the regular expression

new RegExp("\\b("+items.join("|")+")\\b","gi");

The "\b" matches the boundary between a word character and a non-word
character, so it won't match after "John" in "Johnson".
3. I'm actually parsing rows in a table, so need to construct a more
complicated search pattern than the one I gave to get started.
It is sometimes easier to split the problem into more than one regular
expression. E.g., one to find a table row, another to test whether
it contains the forbidden words. You can alway combine them, they might
just be horribly much bigger.
function clean() {


Ok. If we only aim at newer browsers, try this:

function clean() {
var body = document.body.innerHTML;
var itemRE = new RegExp("\\b("+items.join("|")+")\\b","gi");
body = body.replace(/<tr(.|\s)*?<\/tr>/gi,function(row) {
if (row.match(itemRE)) {
return "";
} else {
return row;
}
});
document.body.innerHTML = body;
}

it replace each table row (from "<tr" to "</tr>") with either
itself or the empty string, depending on whether the row
contains the words in the "items" array.

/L
--
Lasse Reichstein Nielsen - lr*@hotpop.com
Art D'HTML: <URL:http://www.infimum.dk/HTML/randomArtSplit.html>
'Faith without judgement merely degrades the spirit divine.'
Jul 20 '05 #4

P: n/a
On 09 Sep 2003 14:56:54 +0200, Lasse Reichstein Nielsen
<lr*@hotpop.com> wrote:
Ok. If we only aim at newer browsers, try this:


You're awesome :-) Works like a charm. I owe you dinner next time
you're in town.

Thx again
JD.
Jul 20 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.