By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,640 Members | 1,581 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,640 IT Pros & Developers. It's quick & easy.

Finding position of a RegExp subexpression

P: n/a
I need to come up with a function
function regExpPos (text, re, parenNum) { ... }
that will return the position within text of RegExp.$parenNum if there
is a match, and -1 otherwise.

For example:
var re = /some(thing|or other)?.*(n(est)(?:ed)?.*(parens) )/
var text = "There were some nesting parens in the test";
alert (regExpPos (text, re, 3));

should show 17
Would anyone have one of these?
Csaba Gabor from Vienna

Apr 21 '06 #1
Share this Question
Share on Google+
7 Replies


P: n/a
Csaba Gabor said the following on 4/21/2006 1:23 PM:
I need to come up with a function
function regExpPos (text, re, parenNum) { ... }
that will return the position within text of RegExp.$parenNum if there
is a match, and -1 otherwise.


There is one already. indexOf :)
Never tried it with RegExp's though :)
--
Randy
comp.lang.javascript FAQ - http://jibbering.com/faq & newsgroup weekly
Javascript Best Practices - http://www.JavascriptToolbox.com/bestpractices/
Apr 21 '06 #2

P: n/a
Randy Webb wrote:
Csaba Gabor said the following on 4/21/2006 1:23 PM:
I need to come up with a function
function regExpPos (text, re, parenNum) { ... }
that will return the position within text of RegExp.$parenNum if there
is a match, and -1 otherwise.


There is one already. indexOf :)
Never tried it with RegExp's though :)


The problem with
function regExpPos (text, re, parenNum) {
if (!text.match(re)) return -1;
return text.indexOf(RegExp['$'+parenNum], RegExp.leftContext.length)
}

is that RegExp['$'+parenNum] may not be unique within text (though it
is in the example that I gave). So if I change text to
var text = "There were some questionable nesting parens in the test";
regExpPos (text, re, 3) would return 18 instead of the correct 30.

Csaba

By the way, thanks for that ear piercing demo in the other thread. :)

The problem with using text.indexOf(RegExp.$pare,pos) will find the
position of substring within string, but the problem is that that
RegExp.$parenNum may not be unique within string

Apr 21 '06 #3

P: n/a
Csaba Gabor said the following on 4/21/2006 2:48 PM:
Randy Webb wrote:
Csaba Gabor said the following on 4/21/2006 1:23 PM:
I need to come up with a function
function regExpPos (text, re, parenNum) { ... }
that will return the position within text of RegExp.$parenNum if there
is a match, and -1 otherwise. There is one already. indexOf :)
Never tried it with RegExp's though :)


The problem with
function regExpPos (text, re, parenNum) {
if (!text.match(re)) return -1;
return text.indexOf(RegExp['$'+parenNum], RegExp.leftContext.length)
}

is that RegExp['$'+parenNum] may not be unique within text (though it
is in the example that I gave). So if I change text to
var text = "There were some questionable nesting parens in the test";
regExpPos (text, re, 3) would return 18 instead of the correct 30.


My knowledge of RegExp's may not be well enough to understand them so I
may be reading it wrong, but if you want the last match, then
lastIndexOf gives it. -1 if no match.
Csaba

By the way, thanks for that ear piercing demo in the other thread. :)


It does a better job than coffee at 5 am :)

--
Randy
comp.lang.javascript FAQ - http://jibbering.com/faq & newsgroup weekly
Javascript Best Practices - http://www.JavascriptToolbox.com/bestpractices/
Apr 21 '06 #4

P: n/a
JRS: In article <11**********************@j33g2000cwa.googlegroups .com>
, dated Fri, 21 Apr 2006 10:23:41 remote, seen in
news:comp.lang.javascript, Csaba Gabor <da*****@gmail.com> posted :
I need to come up with a function
function regExpPos (text, re, parenNum) { ... }
that will return the position within text of RegExp.$parenNum if there
is a match, and -1 otherwise.

For example:
var re = /some(thing|or other)?.*(n(est)(?:ed)?.*(parens) )/
var text = "There were some nesting parens in the test";
alert (regExpPos (text, re, 3));

should show 17


If you can alter the RegExp by inserting extra parentheses so that
everything is matched, them you could sum the lengths of all lower
matches.

Or you could then, with .replace, substitute all lower matches to "",
and see by how much the length has changed.

But I don't know whether that would always work with sufficiently
complex RegExps.

You could .replace the parameter in question with an Unreasonable String
(it is, after all, Unicode) and then do indexOf(that US).

Note : if the original string is less than 2^16 characters long, there
mist be at least one "16-bit" Unicode character that it does not
contain. So to find a one-character US, start searching for each
possible character in turn (starting with the least plausible) until you
find one that is not there.

Untested.

--
John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v4.00 IE 4
<URL:http://www.jibbering.com/faq/> JL/RC: FAQ of news:comp.lang.javascript
<URL:http://www.merlyn.demon.co.uk/js-index.htm> jscr maths, dates, sources.
<URL:http://www.merlyn.demon.co.uk/> TP/BP/Delphi/jscr/&c, FAQ items, links.
Apr 21 '06 #5

P: n/a
Dr John Stockton wrote:
JRS: In article <11**********************@j33g2000cwa.googlegroups .com>
, dated Fri, 21 Apr 2006 10:23:41 remote, seen in
news:comp.lang.javascript, Csaba Gabor <da*****@gmail.com> posted :
I need to come up with a function
function regExpPos (text, re, parenNum) { ... }
that will return the position within text of RegExp.$parenNum if there
is a match, and -1 otherwise.

For example:
var re = /some(thing|or other)?.*(n(est)(?:ed)?.*(parens) )/
var text = "There were some nesting parens in the test";
alert (regExpPos (text, re, 3));

should show 17
If you can alter the RegExp by inserting extra parentheses so that
everything is matched, them you could sum the lengths of all lower
matches.


This is, in effect, what I have done, code provided below. However, it
is a non trivial process that must account for nested parentheses
(...(...()...()...)...(...()...)...), back references (\#), and non
capturing subexpressions (?:...).
Or you could then, with .replace, substitute all lower matches to "",
and see by how much the length has changed.

But I don't know whether that would always work with sufficiently
complex RegExps.

You could .replace the parameter in question with an Unreasonable String
(it is, after all, Unicode) and then do indexOf(that US).


I appreciate the brainstorming. Back references render the remaining
above ideas unworkable, as far as I can tell. Below is a function I
coded up which does the job. It works by introducing parens ending at
the start of the specified capturing parens [those are parens that
don't start with (?:] and stretching back to the start of the
containing capturing parens. Of course the containing paren's position
must be identified, too, so you get the idea this is recursive. The
complete listing of the function in all its gory glory follows (not
extensively tested).

Csaba Gabor from Vienna
function regExpPos (text, re, parenNum) {
// returns the starting position of the parenNum-th capturing parens
// of the RegExp, re, when matching text; -1 if not successful
if (!parenNum) { // terminating case
if (!text.match(re)) return -1;
return RegExp.leftContext.length; }
var i, j, aParen, src=re.source;
if (arguments.length<4) { // initial entry - this section determines
// opening and closing positions of all capturing parens
var code, chr;
aParen = [[0, src.length]];
var mode = 0; // 0 => normal, 1 => character []
for (i=0;i<src.length;++i) {
if ((chr=src.charAt(i))=="\\") { ++i; continue; }
if (mode) { if (chr=="]") mode = 0; continue; }
if (chr=="[") { mode = 1; continue; }
if (chr=="(" && src.substr(i+1,2)!="?:") aParen.push([i, -1]);
else if (chr==")")
for (j=aParen.length;j--;)
if (aParen[j][1]<0) { aParen[j][1]=i; break; }
}
if (parenNum>=aParen.length) {
if (!text.match(re)) return -1;
return (RegExp.leftContext.length + RegExp.lastMatch.length); }
} else aParen = arguments[3];

// step 1 - find the containing parens (cp, aCP)
var aTP = aParen[parenNum]; // parenNum's start, end position
for (var cP=parenNum;cP--;) if (aParen[cP][1]>aTP[1]) break;
var res, aP2, aCP = aParen[cP]; // containing paren's start, end pos

// step 2 - avoid introducing extra level of parens
// for when cP to parenNum is completely filled with parens
for (i=parenNum, aP2=[i];--i>cP;)
if (aParen[aP2[aP2.length-1]][0]==aParen[i][1]+1)
aP2[aP2.length] = i;
if (aParen[aP2[aP2.length-1]][0]==aCP[0]+1) {
if (!text.match(re)) return -1;
for (res=0, i=aP2.length;--i;) res += RegExp['$'+aP2[i]].length;
return res + (!cP ? RegExp.leftContext.length :
regExpPos(text, re, cP, aParen)); }

// step 3 - insert parens from start of cP to start of parenNum
//alert (aParen.join("\n"));
src = src.slice(0,i=aCP[0]) + "(" +
src.slice(i,i=aTP[0]) + ")" + src.slice(i);

// step 4 - replace back references >= parenNum
for (i=0;i<src.length;++i) {
if ((chr=src.charAt(i))=="\\") {
if (!mode && (code=src.charCodeAt(i+1))<57 && (code>=48+(cP+1)))
src = src.slice(0,i+1) + String.fromCharCode(code+1) +
src.slice(i+2);
++i;
continue; }
if (mode) { if (chr=="]") mode = 0; continue; }
if (chr=="[") { mode = 1; continue; }
}

// step 5 - do the regular expression
var rex = /x/;
rex.compile(src);
if (!text.match(rex)) return -1;
return RegExp['$'+(cP+1)].length +
(!cP ? RegExp.leftContext.length :
regExpPos(text, re, cP, aParen));
}

Apr 22 '06 #6

P: n/a
"Csaba Gabor" <da*****@gmail.com> writes:
I need to come up with a function
function regExpPos (text, re, parenNum) { ... }
that will return the position within text of RegExp.$parenNum if there
is a match, and -1 otherwise.


I can't see an immediate way that works with all regexps and/or
texts. You only get the value of the group match, and that can be very
un-unique in the string, and even in the match. The only index you
ever get is the index of the entire match.

/L
--
Lasse Reichstein Nielsen - lr*@hotpop.com
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
'Faith without judgement merely degrades the spirit divine.'
Apr 22 '06 #7

P: n/a
JRS: In article <ad********************@comcast.com>, dated Fri, 21 Apr
2006 15:00:08 remote, seen in news:comp.lang.javascript, Randy Webb
<Hi************@aol.com> posted :
Csaba Gabor said the following on 4/21/2006 2:48 PM:
Randy Webb wrote:
Csaba Gabor said the following on 4/21/2006 1:23 PM:
I need to come up with a function
function regExpPos (text, re, parenNum) { ... }
that will return the position within text of RegExp.$parenNum if there
is a match, and -1 otherwise.
There is one already. indexOf :)
Never tried it with RegExp's though :)


The problem with
function regExpPos (text, re, parenNum) {
if (!text.match(re)) return -1;
return text.indexOf(RegExp['$'+parenNum], RegExp.leftContext.length)
}

is that RegExp['$'+parenNum] may not be unique within text (though it
is in the example that I gave). So if I change text to
var text = "There were some questionable nesting parens in the test";
regExpPos (text, re, 3) would return 18 instead of the correct 30.


My knowledge of RegExp's may not be well enough to understand them so I
may be reading it wrong, but if you want the last match, then
lastIndexOf gives it. -1 if no match.


ISTM that, if he had wanted that, he would have said so. After all, the
Viennese are good at English.
Testing such as

R = ("12j3456789").match(/(\d)(\d)(\d)(\d)/)
A = R['lastIndex']

suggests that A is indeed the index at which to start the next match,
and
A = R['lastIndex'] - R[R.length-1].length

is therefore the beginning of the last match.

So, Csaba, you just need a RegExp that edits RegExps to have only n
matches, and a question very similar to the original is already
answered.

It looks as if RegExp.leftContext.length *may* actually answer the
modified question but IE4 appears not to have leftContext.

Small Flanagan asserts that IE4 has neither leftContext not lastIndex.
<FAQENTRY> The FAQ needs a goof link or two, and a supporting entry, for
RegExp.

--
John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v4.00 IE 4
<URL:http://www.jibbering.com/faq/> JL/RC: FAQ of news:comp.lang.javascript
<URL:http://www.merlyn.demon.co.uk/js-index.htm> jscr maths, dates, sources.
<URL:http://www.merlyn.demon.co.uk/> TP/BP/Delphi/jscr/&c, FAQ items, links.
Apr 22 '06 #8

This discussion thread is closed

Replies have been disabled for this discussion.