By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
431,991 Members | 1,737 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 431,991 IT Pros & Developers. It's quick & easy.

RegEx to find a word not enclosed in paranthesis

P: n/a
I have a text and I need to find a Word that are not enclosed in
paranthesis. Can it be done with a regex? Is someone could help me?
I am not familar with regex...

Example looking for WORD:
(there is a WORD in ( my string WORD )) and * WORD * to (find WORD)
and * WORD *

Should give me the to word between star (star ar not part of string)

thanks a lot

Oct 31 '06 #1
Share this Question
Share on Google+
1 Reply


P: n/a
I don't believe this can be done using Regular Expressions, at least not
practically. I'll tell you why:

In order to identify the WORD you're looking for, the only rule that can be
applied is that it is preceded by the exact same number of left and right
parentheses. That means that the number of left parentheses before the WORD
and the number of right parentheses before the WORD must be the same,
whether 0 or more, but the exact same number of each.

In addition, the left and right parentheses have to be in order, that is, if
there are 2 left parentheses, they must be followed (at some point) by 2
right parentheses. In other words, you can't have 1 left parenthesis
followed by 2 right parentheses followed by one left parenthesis. And you
can't start with right parentheses. You must always have a number higher
than 0 of left parenthesis, followed by some sequence of 0 or more
characters that is NOT "WORD" followed by the exact same number of right
parentheses.

Since Regular Expressions does not have the capacity to count, this can't be
done using Regular Expressions. However, as I was able to determine the rule
for identifying WORD, I also have some idea of how it might be done using
string and character manipulation.

Since you're looking for the incidences of a string within a string, you
don't need to actually match the string, but only to know what the indices
of the incidences of the string within the origin string are. That is, once
you know the indices of the incidences, and you know what the search string
is, you can find them all within the string any time you need to.

You would need 2 variables, one to keep a count of left parenteses, and one
to keep a count of right parentheses. When you hit a left parenthesis,
increment the left parenthesis variable. If the 2 variables are not of equal
value, you don't do anything. If they are, you begin to check the characters
following for the search string ("WORD"). Here's an example. I've tested
this using all possible combinations, with one exception. It assumes that
left and right parentheses will always be in left-right order. That is, if
there is a stray parenthesis, or if the parentheses are somehow reversed in
the string, it may not work as advertised, and you may need to revise it:

/// <summary">
/// Finds the indices of all incidences of <paramref name="searchString"/>
/// found in <paramref name="origin"/that are not
/// enclosed within parentheses.
/// </summary>
/// <param name="origin">String to Search.</param>
/// <param name="searchString">String to Find.</param>
/// <returns>An array of the indices of all incidences of <paramref
name="searchString"/>
/// found in <paramref name="origin"/that are not enclosed within
parentheses,
/// or an empty integer array if not found.</returns>
public static int[] IndicesWithoutParentheses(string origin, string
searchString)
{
char c;
int i, count = 0;
int leftCount = 0, rightCount = 0;
int originIndex, searchIndex;

int originLength = origin.Length;
int searchLength = searchString.Length;

int[] indices = new int[originLength]; // holds indices found
int[] result; // return value
for (i = 0; i < indices.Length; i++)
indices[i] = -1; // No index

// Iterate through the origin string
for (originIndex = 0; originIndex < originLength; originIndex++)
{
c = origin[originIndex]; // Current char
if (c == '(') leftCount++; // Count left parentheses
else if (c == ')') rightCount++; // Count right parentheses
else if (leftCount == rightCount)
{
i = originIndex;
// Find the first letter of searchString prior to any left parenthesis
while (i < origin.Length && origin[i] != searchString[0] &&
origin[i] != '(') i++;
// if we've reached the end of the origin string, we're done.
if (i == origin.Length) break;
// Otherwise, we set originIndex to i, and begin searching for
searchString
originIndex = i + 1;
if (origin[i] == '(')
{
leftCount++;
originIndex--;
continue;
}
// Begin looking for searchString
for (searchIndex = 1; searchIndex < searchLength; i++)
if (searchString[searchIndex++] != origin[originIndex++]) break;
// if the loop did not break, we have found one
if (searchIndex == searchLength) indices[count++] = originIndex -
searchIndex;
originIndex--; // need to back up one because outer loop increments.
}
}
i = Array.IndexOf<int>(indices, -1);
if (i <= 0) result = new int[0];
else
{
result = new int[i];
Array.Copy(indices, result, i);
}
return result;
}

--
HTH,

Kevin Spencer
Microsoft MVP
Short Order Coder
http://unclechutney.blogspot.com

The devil is in the yada yada yada
<vm*****@gmail.comwrote in message
news:11**********************@i42g2000cwa.googlegr oups.com...
>I have a text and I need to find a Word that are not enclosed in
paranthesis. Can it be done with a regex? Is someone could help me?
I am not familar with regex...

Example looking for WORD:
(there is a WORD in ( my string WORD )) and * WORD * to (find WORD)
and * WORD *

Should give me the to word between star (star ar not part of string)

thanks a lot

Oct 31 '06 #2

This discussion thread is closed

Replies have been disabled for this discussion.