Regular expression to exclude lines?

Shannon Jacobs

Sorry to ask what is surely a trivial question. Also sorry that I don't have
my current code version on hand, but... Anyway, must be some problem with
trying to do the negative. It seems like I get into these ruts each time I
try to deal with regular expressions.

All I'm trying to do is delete the lines which don't contain a particular
string. Actually a filter to edit a log file. I can find and replace a thing
with null, but can't figure out how to find the lines which do not contain
the thing.

Going further, I want to generalize and use a JavaScript variable containing
the decision string, but first I need to worry about the not-within-a-line
problem.

Jul 20 '05 #1

Subscribe Reply

11728

Thomas 'PointedEars' Lahn

Shannon Jacobs wrote:

Sorry to ask what is surely a trivial question.
Hm, I don't think it is this trivial.
All I'm trying to do is delete the lines which don't contain a
particular string. Actually a filter to edit a log file. I can
find and replace a thing with null, but can't figure out how to
find the lines which do not contain the thing.

Here's a quickhack that filters out of three lines the one that
does not contain the word `line':

alert("this is a line\nthis is a\nthis is a
line".match(/\n*[^\n]*\n*([^\n]*[^l][^i][^n][^e][^\n]*\n)*[^\n]*\n*/)[1])

But there must be a better a better way, IIRC there is something
called `negative lookahead', supported from JavaScript 1.5 on,
which I have yet not worked with.
PointedEars

Jul 20 '05 #2

Lasse Reichstein Nielsen

Thomas 'PointedEars' Lahn <Po*********@web.de> writes:

Shannon Jacobs wrote:
Sorry to ask what is surely a trivial question.
Hm, I don't think it is this trivial.

Neither do I. Negative matches in regular expressions rarely are.
Here's a quickhack that filters out of three lines the one that
does not contain the word `line':

alert("this is a line\nthis is a\nthis is a
line".match(/\n*[^\n]*\n*([^\n]*[^l][^i][^n][^e][^\n]*\n)*[^\n]*\n*/)[1])
That's purely accidental. If you add a line in front, e.g.,
"bad thing\nthis is a line\nthis is a\n this is a line", it matches
the string containing the second and third line.
But there must be a better a better way, IIRC there is something
called `negative lookahead', supported from JavaScript 1.5 on,
which I have yet not worked with.

Negative lookahead might be an easier way to do it.

The hard way:

/^([^l\n]*(l[^i]|li[^n]|lin[^e]))*([^l\n])*$/m
(any "l" is not followed by "ine")

With negative lookahead:
/^([^l\n]*l(?!ine))*[^l\n]*$/m

The "m" at the end makes "^" and "$" match beginning/end of line.

These regexps only check for the letters "line", not whether they
occur as a word. To do that, one must check for word boundaries around it:

Hard:
/^([^l\n]*(\bl([^i]|i[^n]|in[^e]|ine\B)|\Bl))*[^l\n]*$/m
Easy:
/^([^l\n]*(\bl(?!ine\b)|\Bl))*[^l\n]*$/m

Any "l" right after a word boundary is not followed by ine+word boundary.

To test this regexp, try:
---
var regexp = /^([^l\n]*(\bl(?!ine\b)|\Bl))*[^l\n]*$/mg ;
var lines = "nonline\nline\nlinefeed\nwith line in the middle\n"+
"no l-word here\n\nprevious l-word was empty\nand ending in line";
var dellines = lines.replace(regexp,"---DELETED---");
alert(lines);
alert(dellines);
---
A longer explanation of:
/^([^l\n]*(\bl(?!ine\b)|\Bl))*[^l\n]*$/m
^ beginning of line
^ some non-l/non-newlines
^ either wordboundary + l not followed by "ine"+wordboundary
^or l not after word boundary
^any number of times
^ and then some non-l/non-newlines again.

Good luck:)
/L
--
Lasse Reichstein Nielsen - lr*@hotpop.com
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
'Faith without judgement merely degrades the spirit divine.'

Jul 20 '05 #3

Evertjan.

Lasse Reichstein Nielsen wrote on 24 nov 2003 in comp.lang.javascript:

Negative lookahead might be an easier way to do it.

What about this non greedy "*?" form:

<script>

function replLine(x,t){
t+="\n"
var re = new RegExp("[^\n]*?"+x+"[^\n]*\n","g");
t = t.replace(re ,"")
return t.replace(/\n$/,"")
}

tx="bad thing\nthis is a line\nthis is a\n this is a line"

alert(replLine("thing",tx))
alert(replLine("line",tx))

</script>

--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)

Jul 20 '05 #4

Evertjan.

Evertjan. wrote on 24 nov 2003 in comp.lang.javascript:

Lasse Reichstein Nielsen wrote on 24 nov 2003 in comp.lang.javascript:
Negative lookahead might be an easier way to do it.

What about this non greedy "*?" form:

<script>

function replLine(x,t){
t+="\n"
var re = new RegExp("[^\n]*?"+x+"[^\n]*\n","g");
t = t.replace(re ,"")
return t.replace(/\n$/,"")
}

tx="bad thing\nthis is a line\nthis is a\n this is a line"

alert(replLine("thing",tx))
alert(replLine("line",tx))

</script>

"All I'm trying to do is delete the lines which don't contain a
particular string. "

Wow, I missed the "n't"

I will try again later.

--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)

Jul 20 '05 #5

Evertjan.

Evertjan. wrote on 24 nov 2003 in comp.lang.javascript:

"All I'm trying to do is delete the lines which don't contain a
particular string. "

Wow, I missed the "n't"

I will try again later.

This better?

<script>

function replLine(x,t){
var re = new RegExp(x,"");
t+="\n"
t = t.replace(
/.*?\n/g,
function($0,$1,$2)
{return (!re.test($0))?$0:""}
)
return t.replace(/\n$/,"")
}

tx="bad thing\nthis is a line\nthis is a\n this is a line"

alert(replLine("thing",tx))
alert(replLine("line",tx))

</script>

--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)

Jul 20 '05 #6

Evertjan.

Evertjan. wrote on 24 nov 2003 in comp.lang.javascript:

Evertjan. wrote on 24 nov 2003 in comp.lang.javascript:
"All I'm trying to do is delete the lines which don't contain a
particular string. "

Wow, I missed the "n't"

I will try again later.

This better?

<script>

function replLine(x,t){
var re = new RegExp(x,"");
t+="\n"
t = t.replace(
/.*?\n/g,
function($0,$1,$2)
{return (!re.test($0))?$0:""}
)
return t.replace(/\n$/,"")
}

tx="bad thing\nthis is a line\nthis is a\n this is a line"

alert(replLine("thing",tx))
alert(replLine("line",tx))

</script>

Monologue follows.

Damn, forgot to remove the "!"

<script>

function replLine(x,t){
var re = new RegExp(x,"");
t+="\n"
t = t.replace(
/.*?\n/g,
function($0,$1,$2)
{return (re.test($0))?$0:""}
)
return t.replace(/\n$/,"")
}

tx="bad thing\nthis is a line\nthis is a\n this is a line"

alert(replLine("thing",tx))
alert(replLine("line",tx))

</script>

--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)

Jul 20 '05 #7

Lasse Reichstein Nielsen

"Evertjan." <ex**************@interxnl.net> writes:

Evertjan. wrote on 24 nov 2003 in comp.lang.javascript:
This better?

Monologue follows.

Damn, forgot to remove the "!" function replLine(x,t){
var re = new RegExp(x,"");
t+="\n"
t = t.replace(
/.*?\n/g,
function($0,$1,$2)
{return (re.test($0))?$0:""}
)
return t.replace(/\n$/,"")
}

This first splits the string into lines, and then replaces each line
based on a second test.
It should work (and seems to).

I don't think you need a non-greedy match (.*?) since . doesn't match
a newline character.
Maybe you can get around adding the extra "\n" by using multiline
matching: /^.*$/gm,
It doesn't remove the newlines in the string though. None of my
attempts have done that so far.

This method uses several regexp matches, not just one (which is
sometimes the better approach :), but the first is really just
splitting into lines. You can use the split method for that.

How about this:

// returns new array containing only those elements that match re
Array.prototype.filter = function filter(re) {
var res = [];
for (var i=0;i<this.length;i++) {
if (re.test(this[i])) {res.push(this[i]);}
}
return res;
}

var tx="bad thing\nthis is a line\nthis is a\n this is a line";
alert(tx.split("\n").filter(/line/).join("\n"));

Sadly, adding properties to Array.prototype means that you can't
(easily) use
for (var i in this)
to iterate through a sparse array. The filter method is enumerable,
so it is also included.

/L
--
Lasse Reichstein Nielsen - lr*@hotpop.com
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
'Faith without judgement merely degrades the spirit divine.'

Jul 20 '05 #8

Evertjan.

Lasse Reichstein Nielsen wrote on 24 nov 2003 in comp.lang.javascript:

I don't think you need a non-greedy match (.*?) since . doesn't match
a newline character.

True !

--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)

Jul 20 '05 #9

Dr John Stockton

JRS: In article <3F**************@PointedEars.de>, seen in
news:comp.lang.javascript, Thomas 'PointedEars' Lahn
<Po*********@web.de> posted at Mon, 24 Nov 2003 16:39:25 :-

Shannon Jacobs wrote:
Sorry to ask what is surely a trivial question.

Hm, I don't think it is this trivial.
All I'm trying to do is delete the lines which don't contain a
particular string. Actually a filter to edit a log file. I can
find and replace a thing with null, but can't figure out how to
find the lines which do not contain the thing.

Here's a quickhack that filters out of three lines the one that
does not contain the word `line':

alert("this is a line\nthis is a\nthis is a
line".match(/\n*[^\n]*\n*([^\n]*[^l][^i][^n][^e][^\n]*\n)*[^\n]*\n*/)[1])

But there must be a better a better way, IIRC there is something
called `negative lookahead', supported from JavaScript 1.5 on,
which I have yet not worked with.

AIUI, the OP wants a file which is the previous file minus those lines
which do not contain the string. That code, after broken-string
correction, pops up a box showing the first unwanted line.

Javascript "alone" is not capable of file handling, AFAICS.

If the OP can read and write the file line by line, controlled by
javascript, and apply script to each line, then it is only necessary to
do (pseudo-code follows)

while not EoF(FI) do begin Readln(FI, S) ; // pascal
if ( ! /«string»/.test(S) ) continue // javascript
Writeln(FO, S) end ; // pascal

The OP has MSOE, which suggests Windows. If the job is to be run in
DOS, Windows, or UNIX, then the task is trivial using MiniTrue, which
IMHO is a most valuable tool. Example :

mtr -no~ jt.htm - e

will put, on standard output, all those lines of jt.htm which do not
contain the letter e. A RegExp can be used for the search, in place
of e. There may be a way of doing it without using standard output.

--
© John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v4.00 MIME. ©
Web <URL:http://www.merlyn.demon.co.uk/> - FAQish topics, acronyms, & links.
I find MiniTrue useful for viewing/searching/altering files, at a DOS prompt;
free, DOS/Win/UNIX, <URL:http://www.idiotsdelight.net/minitrue/> Update soon?

Jul 20 '05 #10

Mark Szlazak

Try this to exclude lines that don't have "something" in them:

rx = /^(?:(?!\bsomething\b).)*$/gm;
output = input.replace(rx,'');

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!

Jul 20 '05 #11

Dr John Stockton

JRS: In article <3f*********************@news3.asahi-net.or.jp>, seen
in news:comp.lang.javascript, Shannon Jacobs <sh****@my-deja.com> posted
at Mon, 24 Nov 2003 21:50:22 :-

All I'm trying to do is delete the lines which don't contain a particular
string. Actually a filter to edit a log file. I can find and replace a thing
with null, but can't figure out how to find the lines which do not contain
the thing.

Of course, for a *particular* string, not requiring a RegExp, DOS batch
provides the answer, and there must surely be a UNIX equivalent.

find "mystring" < old.log > new.log

It seems likely that someone has written a version of DOS find that
accepts RegExps; given RegExp code in library form, the job seems
trivial. UNIX has grep, which should do; and there are ports of grep to
DOS & Windows. Also, WSH has file I/O and RegExps, AIUI.

The important thing seems to be to not bother with a RegExp
substitution, but to work line-by-line and do a RegExp (or other) test
of acceptability.

--
© John Stockton, Surrey, UK. ?@merlyn.demon.co.uk DOS 3.3, 6.20; Win98. ©
Web <URL:http://www.merlyn.demon.co.uk/> - FAQqish topics, acronyms & links.
PAS EXE TXT ZIP via <URL:http://www.merlyn.demon.co.uk/programs/00index.htm>
My DOS <URL:http://www.merlyn.demon.co.uk/batfiles.htm> - also batprogs.htm.

Jul 20 '05 #12

Thomas 'PointedEars' Lahn

Dr John Stockton wrote:

Of course, for a *particular* string, not requiring a RegExp, DOS batch
provides the answer, and there must surely be a UNIX equivalent.

find "mystring" < old.log > new.log

grep -v 'mystring' old.log >new.log 2>&1

The single quotes are only required if special shell expressions are
used but not escaped. 2>&1 captures (error) messages in new.log, too.
If you do not want that, leave it out.
PointedEars

Jul 20 '05 #13

Dr John Stockton

JRS: In article <3F**************@PointedEars.de>, seen in
news:comp.lang.javascript, Thomas 'PointedEars' Lahn
<Po*********@web.de> posted at Wed, 26 Nov 2003 17:27:13 :-

Dr John Stockton wrote:
Of course, for a *particular* string, not requiring a RegExp, DOS batch
provides the answer, and there must surely be a UNIX equivalent.

find "mystring" < old.log > new.log

grep -v 'mystring' old.log >new.log 2>&1

The single quotes are only required if special shell expressions are
used but not escaped. 2>&1 captures (error) messages in new.log, too.
If you do not want that, leave it out.

I'm not aware of 2>&1 being valid in either DOS or Win98, and GREP is
not part of those systems but must be imported.

Why did you cut the part where I wrote "UNIX has grep, which should do;
and there are ports of grep to DOS & Windows." ?

IMHO, MiniTrue is more useful than GREP and SED; see my reply to your
earlier post.

--
© John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v4.00 MIME. ©
Web <URL:http://www.merlyn.demon.co.uk/> - FAQish topics, acronyms, & links.
I find MiniTrue useful for viewing/searching/altering files, at a DOS prompt;
free, DOS/Win/UNIX, <URL:http://www.idiotsdelight.net/minitrue/> Update soon?

Jul 20 '05 #14

Thomas 'PointedEars' Lahn

Dr John Stockton wrote:

Thomas 'PointedEars' Lahn wrote:
Dr John Stockton wrote:
Of course, for a *particular* string, not requiring a RegExp, DOS batch
provides the answer, and there must surely be a UNIX equivalent.

find "mystring" < old.log > new.log
grep -v 'mystring' old.log >new.log 2>&1

The single quotes are only required if special shell expressions are
used but not escaped. 2>&1 captures (error) messages in new.log, too.
If you do not want that, leave it out.

I'm not aware of 2>&1 being valid in either DOS or Win98,

It is only available in Cmd.exe of Windows NT-based systems and
Unices, generally saying in POSIX-compatible shells, of course.
and GREP is not part of those systems but must be imported.
Of course. My posting was only regarding "there must
surely be a UNIX equivalent". Here you are :)
Why did you cut the part where I wrote "UNIX has grep, which should do;
and there are ports of grep to DOS & Windows." ?
Just oversaw it.
IMHO, MiniTrue is more useful than GREP and SED; see my reply to your
earlier post.

MiniTrue is not part of a basic installation of Unices,
though. (Instead, mtr refers to Matt's Traceroute.)
F'up2 poster

PointedEars

Jul 20 '05 #15

Dr John Stockton

JRS: In article <3F**************@PointedEars.de>, seen in
news:comp.lang.javascript, Thomas 'PointedEars' Lahn
<Po*********@web.de> posted at Thu, 27 Nov 2003 01:27:41 :-

Dr John Stockton wrote:
Thomas 'PointedEars' Lahn wrote:
grep -v 'mystring' old.log >new.log 2>&1

The single quotes are only required if special shell expressions are
used but not escaped. 2>&1 captures (error) messages in new.log, too.
If you do not want that, leave it out.

I'm not aware of 2>&1 being valid in either DOS or Win98,

It is only available in Cmd.exe of Windows NT-based systems and
Unices, generally saying in POSIX-compatible shells, of course.

There is no reason to assume that the OP is aware of that; or even of
that part of that that is applicable to the system in question. a
plausible but inapplicable or incorrect "solution" is worse than
useless".

IMHO, MiniTrue is more useful than GREP and SED; see my reply to your
earlier post.

MiniTrue is not part of a basic installation of Unices,
though. (Instead, mtr refers to Matt's Traceroute.)

Indeed; nor of DOS; which is sufficiently clearly indicated by my
signature to that article.

--
© John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v4.00 MIME ©
Web <URL:http://www.uwasa.fi/~ts/http/tsfaq.html> -> Timo Salmi: Usenet Q&A.
Web <URL:http://www.merlyn.demon.co.uk/news-use.htm> : about usage of News.
No Encoding. Quotes before replies. Snip well. Write clearly. Don't Mail News.

Jul 20 '05 #16

Thomas 'PointedEars' Lahn

Dr John Stockton wrote:

Thomas 'PointedEars' Lahn wrote:
Dr John Stockton wrote:
Thomas 'PointedEars' Lahn wrote:
grep -v 'mystring' old.log >new.log 2>&1

The single quotes are only required if special shell
expressions are used but not escaped. 2>&1 captures (error)
messages in new.log, too. If you do not want that, leave it
out.

I'm not aware of 2>&1 being valid in either DOS or Win98,
It is only available in Cmd.exe of Windows NT-based systems and
Unices, generally saying in POSIX-compatible shells, of course.

There is no reason to assume that the OP is aware of that;

There is no reason that she is interested in it, either.
or even of that part of that that is applicable to the system in
question.
The system in question is unknown.
a plausible but inapplicable or incorrect "solution" is worse than
useless".

This is a JavaScript newsgroup, not a shell script newsgroup. As the
OP has not provided the system on which solutions are supposed to work,
she will test if the provided solutions will work on her system.
PointedEars

Jul 20 '05 #17

Shannon Jacobs

Thomas 'PointedEars' Lahn wrote:

Dr John Stockton wrote:

<snip>

There is no reason to assume that the OP is aware of that;

There is no reason that she is interested in it, either.

<snip>

Return of the original poster... Sorry I've been rather busy and haven't
been able to follow this very interesting thread more closely. (The OP is
male, by the way.) However, judging by the complexity of the discussion, it
seems that there was some reason for my original perplexity, though I
thought it a trivial notion.

First let me try to clarify what I'm doing. I know that JavaScript has no
access to the file system. The files are to be handled directly by the user
of the utility. In the Windows environment, this is trivial with ^A, ^C, and
^V. I didn't mention that part because it's almost mindless now (for me).
The actual steps for the file part are:

Open the converter JavaScript form, then open the target file, ^A, ^C, click
in the form, ^V, click on the convert button of the form, ^A, ^C, click back
in the original file, ^A, ^V, and save the file. Done. (If anyone is curious
and the results of these steps are not obvious enough, I can explain.)

Now to clarify the JavaScript part. This example is from an existing utility
that converts raw HTML into JavaScript. The variable HtmlText is the body of
the file from an input field in the form. The critical function is:

function jsFromHtml(HtmlText) {
HtmlText = HtmlText.replace(/\"/g,"\\\"");
HtmlText = HtmlText.replace(/[\r\n]+/g,"\");\r\ndocument.writeln(\"");
return "document.writeln(\"" + HtmlText + "\");";
}

I know this is rather ugly code, and I'd also be interested in improvements,
or even a completely different approach. My JavaScript skills are obviously
rather limited, but this was adequate for my purposes at the time. Since
it's probably not IOttMCO, I'll explain what it does. In the first
executable line, the regular expression escapes all of the double quotes in
the original HTML. In the next line, all of the embedded line breaks are
replaced with the end and start of document.writeln statements, and then the
last line puts one more start and end around the entire thing. The result is
a block of JavaScript code which outputs the arbitrary HTML input. You stick
that into a JavaScript function to create that block of HTML under program
control wherever it is required. (I was especially unhappy with my treatment
of line breaks, and believe this is not a properly general method, though it
works.)

My goal now is to do something similar, but excluding the lines that do not
contain some string. I'm most interested in an elegant solution, though the
discussion so far seems to suggest that there may be no better approach than
parsing the input one line at a time...

An additional wrinkle is that I'd like to generalize a bit by treating the
decision string as a parameter returned in another field of the form.

Jul 20 '05 #18

Lasse Reichstein Nielsen

"Shannon Jacobs" <sh****@my-deja.com> writes:

First let me try to clarify what I'm doing. I know that JavaScript has no
access to the file system. The files are to be handled directly by the user
of the utility. In the Windows environment, this is trivial with ^A, ^C, and
^V.
I have pages like that too, for colorizing HTML and Javascript :)
Now to clarify the JavaScript part. This example is from an existing utility
that converts raw HTML into JavaScript. The variable HtmlText is the body of
the file from an input field in the form. The critical function is:

function jsFromHtml(HtmlText) {
HtmlText = HtmlText.replace(/\"/g,"\\\"");
HtmlText = HtmlText.replace(/[\r\n]+/g,"\");\r\ndocument.writeln(\"");
return "document.writeln(\"" + HtmlText + "\");";
} I know this is rather ugly code, and I'd also be interested in improvements,
or even a completely different approach.
I would use split:

function jsFromHtml(HtmlText) { // I would write HTML in all caps :)
var inputLines = HtmlText.split(/[\r\n]+/);
var outputLines = [];
for (var i=0;i<inputLines.length;i++) {
var safeLine = inputLines[i].replace(/[\\"]/g,"\\$&");
outputLines[i] = "document.writeln(\"" + safeLine + "\");" ;
}
return outputLine.join("\n");
}

(I put a backslash in front of both double quotes and backslashes,
since neither can occour alone in a string. If there are other
characters that makes no sense in a string, they should be
handled as well. Examples could be \t or \b).
My JavaScript skills are obviously rather limited,
Not obviously. It works, and it's something I could find myself doing if
I didn't have split available. It's possibly even faster to change all
the quotes from the beginning instead of doing one replace per line.
but this was adequate for my purposes at the time. Since it's
probably not IOttMCO,
You lost me there :) IOttMCO?
I'll explain what it does.
It's fairly easy to read, as long as you can see what's inside a string
and what's not :)
My goal now is to do something similar, but excluding the lines that do not
contain some string. I'm most interested in an elegant solution, though the
discussion so far seems to suggest that there may be no better approach than
parsing the input one line at a time...
Nothing wrong with one line at a time. If you use the code I showed above,
all you need is to wrap the content of the for loop in an if statement:

if (!/badWord/.test(inputLines[i])) {
... add to output ...
}

or

if (inputLines[i].indexOf("badWord") == -1) {
... add to output ...
}

An additional wrinkle is that I'd like to generalize a bit by treating the
decision string as a parameter returned in another field of the form.

var testRE = RegExp(form.elements['otherField'].value);
if (! testRE.test(inputLines[i])) {
...
}

(To avoid problems or crashes, you might want to screen the other field's
values for characters that are meaningful in regular expressions)
or

var testWord = form.elements['otherField'].value;
if (inputLines[i].indexOf(testWord)==-1) {
... add to output ...
}
Good luck
/L
--
Lasse Reichstein Nielsen - lr*@hotpop.com
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
'Faith without judgement merely degrades the spirit divine.'

Jul 20 '05 #19

Thomas 'PointedEars' Lahn

Lasse Reichstein Nielsen wrote:

"Shannon Jacobs" <sh****@my-deja.com> writes:
but this was adequate for my purposes at the time. Since it's
probably not IOttMCO,

You lost me there :) IOttMCO?

Intuitively Obvious to the Most Casual Observer

http://babylon.com/ (the tool) or http://online.babylon.com/combo/
(the website) are sometimes quite handy :)
HTH

PointedEars

Jul 20 '05 #20

Mark Szlazak

"Shannon Jacobs" <sh****@my-deja.com> wrote in message news:<3f***********************@news3.asahi-net.or.jp>...

My goal now is to do something similar, but excluding the lines that do not
contain some string. I'm most interested in an elegant solution, though the
discussion so far seems to suggest that there may be no better approach than
parsing the input one line at a time...

An additional wrinkle is that I'd like to generalize a bit by treating the
decision string as a parameter returned in another field of the form.

I've tried twice to post a simple solution through Developers Dex but
they haven't appeared in about two days. I'm assuming they're lost.
Anyway, my previous post that did appear starts to point you to a
solution. It doesn't require a seperate process to break up lines and
it works. To remove lines without the substring "something" in them,
here's that solution again.

rx = /^(?:(?!\bsomething\b).)*$/gm;
outText = inText.replace(rx,'');

To make this regular expression dynamic, use the RegExp object
constuctor.

skip = 'something';
pattern = '^(?:(?!\\b' + skip + '\\b).)*$';
rx = new RegExp(pattern, 'gm');
outText = inText.replace(rx,'');

Also, one of your posts talks about linefeeds and the \r\n pattern.
This is OS dependent and linefeeds could also be just \r or \n.

Jul 20 '05 #21

Shannon Jacobs

Mark Szlazak wrote:
<snip of lengthy text describing goal of deleting lines that do not include
a key string>

rx = /^(?:(?!\bsomething\b).)*$/gm;
outText = inText.replace(rx,'');

To make this regular expression dynamic, use the RegExp object
constuctor.

skip = 'something';
pattern = '^(?:(?!\\b' + skip + '\\b).)*$';
rx = new RegExp(pattern, 'gm');
outText = inText.replace(rx,'');

Also, one of your posts talks about linefeeds and the \r\n pattern.
This is OS dependent and linefeeds could also be just \r or \n.

Below is the working code. I'm extremely obliged I hope the embedded
acknowledgment is sufficient, even though I don't expect to actively
broadcast the code. You're certainly a guru in my JavaScript book. The only
real change I had to make was the thing at the end to include the ends of
the lines. Your original version left a blank line, while I wanted to remove
those lines completely. By the way, I tested an earlier non-dynamic version
with Opera and it worked fine. I'll test the dynamic version tomorrow.

My main regret is that I still don't fully understand how it works... Rather
embarrassing, but looks like I'll have to break out the Perl manual
tomorrow.

function keepSelectedLines(keepString, blockOfText) {
// based on tips from Mark Szlazak
pattern = '^(?:(?!\\b' + keepString + '\\b).)*$[\r\n]*';
rx = new RegExp(pattern, 'gm');
blockOfText = blockOfText.replace(rx,'');
return blockOfText;
}

Jul 20 '05 #22

Thomas 'PointedEars' Lahn

Shannon Jacobs wrote:

Below is the working code. [...]
My main regret is that I still don't fully understand how it works... Rather
embarrassing, but looks like I'll have to break out the Perl manual
tomorrow.
Why, see the Reference:

http://devedge.netscape.com/library/...p.html#1010689
function keepSelectedLines(keepString, blockOfText) {
// based on tips from Mark Szlazak
pattern = '^(?:(?!\\b' + keepString + '\\b).)*$[\r\n]*';
This string literal contains a notation later to be used to create
a Regular Expression (RegExp object) that matchesthe beginning of
text (^) followed by none or more than one occurrences (*) of the following:

Match the following but don't remember the match (/?:/):
Match the previous only if the following does _not_ match (/?!/,
negative lookahead): Word boundary ("\\b" becoming /\b/) followed
by the value of `keepString' followed by a word boundary, followed
by any single character except the newline character (/./).

The above should match only if it is followed by the end of the
text followed by none or more than one occurrences (*) of any of
the characters ([...]) \r (carriage return) and \n (linefeed).
rx = new RegExp(pattern, 'gm');
This creates a RegExp object from the above string literal, matching
it on every single line instead of on the whole text ('m'; consider
multiline input), having /^/ and /$/ match the beginning and the end
of line instead of the beginning and the end of text, and matches all
occurrences, not only the first one ('g'; global).

However, it should be noted that it fails if the above string literal,
especially the value of the `keepString' argument, contains
single-escaped or certain double-escaped sequences, e.g. "C:\blurb"
which would then result in /C:blurb/mg in the RegExp, meaning "\b" as
the literal character `b', or "C:\\blurb" which would result in
/C:\blurb/mg, meaning /\b/ as word boundary. For this function, an input
of "C:\\\\blurb" would have to be used to get /C:\\blurb/ in the RegExp,
having /\\/ to match the literal backslash character (`\'), as it was
intended.

(AFAIS there is no general method with JavaScript to convert a string so
that it can be used as argument for the RegExp constructor function with
the resulting RegExp to match the string; simply inserting backslashes
will obviously not work as supposed in all cases.)
blockOfText = blockOfText.replace(rx,'');
Replaces matches of `rx' with the empty string (i.e. deletes the
matching substrings).
return blockOfText;
Returns the changed text.
}

HTH

PointedEars

Jul 20 '05 #23

Shannon Jacobs

Mark Szlazak wrote:
<snip of lengthy text describing goal of deleting lines that do not
include
a key string>

rx = /^(??!\bsomething\b).)*$/gm;
outText = inText.replace(rx,'');

To make this regular expression dynamic, use the RegExp object
constuctor.

skip = 'something';
pattern = '^(??!\\b' + skip + '\\b).)*$';
rx = new RegExp(pattern, 'gm');
outText = inText.replace(rx,'');

Also, one of your posts talks about linefeeds and the \r\n pattern.
This is OS dependent and linefeeds could also be just \r or \n.

Below is the working code. I'm extremely obliged and I hope the
embedded acknowledgment is sufficient, even though I don't expect to
actively broadcast the code. You're certainly a guru in my JavaScript
book. The only real change I had to make was the thing at the end to
include the ends of the lines. Your original version left a blank
line, while I wanted to remove those lines completely. By the way, I
tested an earlier non-dynamic version with Opera and it worked fine.
I'll test the dynamic version tomorrow.

My main regret is that I still don't fully understand how it works...
Rather embarrassing, but looks like I'll have to break out the Perl
manual tomorrow. [Actually, I did look at the manual, and still don7t
understand all of it, though I feel like the pair of \b is not really
required?]

function keepSelectedLines(keepString, blockOfText) {
// based on tips from Mark Szlazak
pattern = '^(??!\\b' + keepString + '\\b).)*$[\r\n]*';
rx = new RegExp(pattern, 'gm');
blockOfText = blockOfText.replace(rx,'');
return blockOfText;
}

(Apologies if this post appears twice, but something strange is going
on here... My newsreader definitely thinks I posted this reply
yesterday, but it seems to have disappeared, just as Mr. Szlazak
reported some of his posts had disppeared. I rather suspect that the
spammers efforts are resulting in so much newsgroup pollution that
non-spam posts are getting caught in the crossfire. Hopefully the
Google routing will work better.)

Jul 20 '05 #24

Mark Szlazak

"Shannon Jacobs" <sh****@my-deja.com> wrote in message news:<3f***********************@news3.asahi-net.or.jp>...

Mark Szlazak wrote:
<snip of lengthy text describing goal of deleting lines that do not include
a key string>

rx = /^(?:(?!\bsomething\b).)*$/gm;
outText = inText.replace(rx,'');

To make this regular expression dynamic, use the RegExp object
constuctor.

skip = 'something';
pattern = '^(?:(?!\\b' + skip + '\\b).)*$';
rx = new RegExp(pattern, 'gm');
outText = inText.replace(rx,'');

Also, one of your posts talks about linefeeds and the \r\n pattern.
This is OS dependent and linefeeds could also be just \r or \n.

Below is the working code. I'm extremely obliged I hope the embedded
acknowledgment is sufficient, even though I don't expect to actively
broadcast the code. You're certainly a guru in my JavaScript book. The only
real change I had to make was the thing at the end to include the ends of
the lines. Your original version left a blank line, while I wanted to remove
those lines completely. By the way, I tested an earlier non-dynamic version
with Opera and it worked fine. I'll test the dynamic version tomorrow.

My main regret is that I still don't fully understand how it works... Rather
embarrassing, but looks like I'll have to break out the Perl manual
tomorrow.

function keepSelectedLines(keepString, blockOfText) {
// based on tips from Mark Szlazak
pattern = '^(?:(?!\\b' + keepString + '\\b).)*$[\r\n]*';
rx = new RegExp(pattern, 'gm');
blockOfText = blockOfText.replace(rx,'');
return blockOfText;
}

NOTE: I've tried posting this in two previous replies which again seem
to be lost.

Thanks Shannon! This regex isn't original and it's probably more
commonly known among Perl programmers.

I have a suggestion. If you what to consume the linefeeds then you
don't need $ in the regex. Change $[\r\n]* to [\r\n]+

Here's how I think about this regex. Starting at a position before the
first character of the string, the negative lookahead checks if its
substring isn't present, if not then the "dot" matches any character
except linefeeds and moves us to a new position just after that
character. This is repeated until the end of the line unless the
negative lookaheads subpattern is found and thus no match. Now, the
caret ^ at the beginning of the regex eliminates "bump-alongs" when
the negative lookaheads subpattern is found. What happens is the regex
engine will do our scanning all over again except from the next
position in the line. Again, if regex match isn't found (e.g.,
lookaheads subpattern is found) then it bumps-along to start at the
next position, re-does the scan, and this bumping-along could continue
to the end of the line.

You want to suppress this because it's not needed, it will not match
the entire line, and it will cause false matches when the engine moves
past say "s" in "something" to start scanning from "omething..." in a
negative lookahead that has "something" as it's subpattern.

At least I think that's how this works ;-)

Jul 20 '05 #25

Shannon Jacobs

Not sure what to make of it, but my original post showed up again after a
couple of days. Maybe server problems at my end?

Mark Szlazak wrote:
<snip>

Also, one of your posts talks about linefeeds and the \r\n pattern.
This is OS dependent and linefeeds could also be just \r or \n.
<snip>
[My first derived version] function keepSelectedLines(keepString, blockOfText) {
// based on tips from Mark Szlazak
pattern = '^(?:(?!\\b' + keepString + '\\b).)*$[\r\n]*';
rx = new RegExp(pattern, 'gm');
blockOfText = blockOfText.replace(rx,'');
return blockOfText;
}

I have a suggestion. If you what to consume the linefeeds then you
don't need $ in the regex. Change $[\r\n]* to [\r\n]+

I'll probably try that suggestion, but I already went in and removed the \b
pair. I'm not sure why you recommended those. Actually, the first person I
showed it to also wanted to be able to do two keys at a time. That turned
out to be easy by entering the keepString as:

(key1|key2)

However, I did run into one problem already... The operation is inconsistent
with Japanese, which uses a DBCS (part of the time). I suspected it might be
one of those byte-alignment problems, but that doesn't seem to make sense if
the regexp is trying to match from every byte position...

And thanks for the explanation of how it works. Already seen a couple, but
that seems to be another aspect of regexp newsgroups?

Jul 20 '05 #26

Mark Szlazak

"Shannon Jacobs" <sh****@my-deja.com> wrote in message news:<3f***********************@news3.asahi-net.or.jp>...

Not sure what to make of it, but my original post showed up again after a
couple of days. Maybe server problems at my end?

Mark Szlazak wrote:
<snip>
Also, one of your posts talks about linefeeds and the \r\n pattern.
This is OS dependent and linefeeds could also be just \r or \n. <snip>
[My first derived version] function keepSelectedLines(keepString, blockOfText) {
// based on tips from Mark Szlazak
pattern = '^(?:(?!\\b' + keepString + '\\b).)*$[\r\n]*';
rx = new RegExp(pattern, 'gm');
blockOfText = blockOfText.replace(rx,'');
return blockOfText;
}

I have a suggestion. If you what to consume the linefeeds then you
don't need $ in the regex. Change $[\r\n]* to [\r\n]+

I'll probably try that suggestion, but I already went in and removed the \b
pair. I'm not sure why you recommended those. Actually, the first person I
showed it to also wanted to be able to do two keys at a time. That turned
out to be easy by entering the keepString as:

(key1|key2)

However, I did run into one problem already... The operation is inconsistent
with Japanese, which uses a DBCS (part of the time). I suspected it might be
one of those byte-alignment problems, but that doesn't seem to make sense if
the regexp is trying to match from every byte position...

And thanks for the explanation of how it works. Already seen a couple, but
that seems to be another aspect of regexp newsgroups?

The \b's are for word boundaries. See what happens when one line
has "Java" but not "JavaScript" and another line has "JavaScript"
but not "Java" with this negative lookahead (?!Java)

JavaScript 1.5 regular expressions are undefined for many unicode
characters and Japanese characters. However, you can specify unicode
character ranges by hex. The following regex would filter Katakana
letters when using the Japanese encoding of this table,
http://www.microsoft.com/globaldev/r...e/dbcs/932.htm

katakana = /[\uff65-\uff9f]/;

Jul 20 '05 #27

Regular expression to exclude lines?

Similar topics