By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
444,246 Members | 1,480 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 444,246 IT Pros & Developers. It's quick & easy.

Regular Expression Hangs

P: n/a
Anyone every heard of the Regex.IsMatch and Regex.Match methods just
hanging and eventually getting a message "Requested Service not
found"?

I have the following pattern:

^(?<OrgCity>([A-Z][\w ]+)+), City of, (?<OrgState>(([A-Z][A-Z])|([A-Z]
[a-z]+\.)))( \((?<OrgCountry>[\w ]{2,})\))?$

(ignore the line wrap)

trying to match the following data:

ENVIRONMENTAL RESTORATION AND WASTE MANAGEMENT, DOE (US)

The following code will simply hang for a long time then write the
message "Requested Service not found" to the debug console:

Regex myRegex = new Regex(@"^(?<OrgCity>([A-Z][\w ]+)+), City of, (?
<OrgState>(([A-Z][A-Z])|([A-Z][a-z]+\.)))( \((?<OrgCountry>[\w ]{2,})
\))?$", RegexOptions.Compiled);

myRegex.IsMatch("ENVIRONMENTAL RESTORATION AND WASTE MANAGEMENT, DOE
(US)");

May 18 '07 #1
Share this Question
Share on Google+
5 Replies


P: n/a
Well, your regular expression is a mess for starters. I would suggest an
alternative, but you haven't given us any rules regarding the pattern(s)
you're trying to match. What you said was:
trying to match the following data:

ENVIRONMENTAL RESTORATION AND WASTE MANAGEMENT, DOE (US)
That is obviously not true. You wouldn't need a regular expression to match
a single fixed string. A regular expression searches for patterns in
strings. Those patterns are defined by rules that are expressed in the
regular expression. And the regular expression you posted, besides being a
mess (I will get to that), couldn't possibly match that string, since it
contains the literal ", City of, " - which is nowhere to be found in your
posted string.

The reason it's a mess is that you have many more Groups than you probably
know. You have 3 named Groups ("OrgCity," "OrgState," and "OrgCountry"), but
you also have FIVE unnamed Groups, and you're using backreferencing, so I'm
not sure where the compiler is throwing up on you.

Because I don't know the rules, I can't really give you a full answer.
However, I can tell you this much:

"^(?<OrgCity>[\w\s]+),"

will capture the following:

ENVIRONMENTAL RESTORATION AND WASTE MANAGEMENT,

Everything but the comma will be in the Group "OrgCity"

"\((?<OrgCountry>[\w\s]{2,})\)?$"

will capture the following:

(US)

Everything but the parentheses will be in the group "OrgCountry"

As for your third Group, I simplified the regular expression to the
following, which has the same rules:

(?<OrgState>[A-Z]{2}|[A-Z][a-z]+\.)

Briefly, it captures 1 of 2 possible patterns:
2 Capital letters
-or-
1 Capital letter followed by 1 or more lower-case letters, followed by a
period

That's the best I can do!

--
HTH,

Kevin Spencer
Microsoft MVP

Printing Components, Email Components,
FTP Client Classes, Enhanced Data Controls, much more.
DSI PrintManager, Miradyne Component Libraries:
http://www.miradyne.net

<sh**********@comcast.netwrote in message
news:11**********************@n59g2000hsh.googlegr oups.com...
Anyone every heard of the Regex.IsMatch and Regex.Match methods just
hanging and eventually getting a message "Requested Service not
found"?

I have the following pattern:

^(?<OrgCity>([A-Z][\w ]+)+), City of, (?<OrgState>(([A-Z][A-Z])|([A-Z]
[a-z]+\.)))( \((?<OrgCountry>[\w ]{2,})\))?$

(ignore the line wrap)

trying to match the following data:

ENVIRONMENTAL RESTORATION AND WASTE MANAGEMENT, DOE (US)

The following code will simply hang for a long time then write the
message "Requested Service not found" to the debug console:

Regex myRegex = new Regex(@"^(?<OrgCity>([A-Z][\w ]+)+), City of, (?
<OrgState>(([A-Z][A-Z])|([A-Z][a-z]+\.)))( \((?<OrgCountry>[\w ]{2,})
\))?$", RegexOptions.Compiled);

myRegex.IsMatch("ENVIRONMENTAL RESTORATION AND WASTE MANAGEMENT, DOE
(US)");

May 18 '07 #2

P: n/a
On May 18, 12:50 pm, "Kevin Spencer" <unclechut...@nothinks.com>
wrote:
Well, your regular expression is a mess for starters. I would suggest an
alternative, but you haven't given us any rules regarding the pattern(s)
you're trying to match. What you said was:
trying to match the following data:
ENVIRONMENTAL RESTORATION AND WASTE MANAGEMENT, DOE (US)

That is obviously not true. You wouldn't need a regular expression to match
a single fixed string. A regular expression searches for patterns in
strings. Those patterns are defined by rules that are expressed in the
regular expression. And the regular expression you posted, besides being a
mess (I will get to that), couldn't possibly match that string, since it
contains the literal ", City of, " - which is nowhere to be found in your
posted string.

The reason it's a mess is that you have many more Groups than you probably
know. You have 3 named Groups ("OrgCity," "OrgState," and "OrgCountry"), but
you also have FIVE unnamed Groups, and you're using backreferencing, so I'm
not sure where the compiler is throwing up on you.

Because I don't know the rules, I can't really give you a full answer.
However, I can tell you this much:

"^(?<OrgCity>[\w\s]+),"

will capture the following:

ENVIRONMENTAL RESTORATION AND WASTE MANAGEMENT,

Everything but the comma will be in the Group "OrgCity"

"\((?<OrgCountry>[\w\s]{2,})\)?$"

will capture the following:

(US)

Everything but the parentheses will be in the group "OrgCountry"

As for your third Group, I simplified the regular expression to the
following, which has the same rules:

(?<OrgState>[A-Z]{2}|[A-Z][a-z]+\.)

Briefly, it captures 1 of 2 possible patterns:
2 Capital letters
-or-
1 Capital letter followed by 1 or more lower-case letters, followed by a
period

That's the best I can do!

--
HTH,

Kevin Spencer
Microsoft MVP

Printing Components, Email Components,
FTP Client Classes, Enhanced Data Controls, much more.
DSI PrintManager, Miradyne Component Libraries:http://www.miradyne.net

<shawnmkra...@comcast.netwrote in message

news:11**********************@n59g2000hsh.googlegr oups.com...
Anyone every heard of the Regex.IsMatch and Regex.Match methods just
hanging and eventually getting a message "Requested Service not
found"?
I have the following pattern:
^(?<OrgCity>([A-Z][\w ]+)+), City of, (?<OrgState>(([A-Z][A-Z])|([A-Z]
[a-z]+\.)))( \((?<OrgCountry>[\w ]{2,})\))?$
(ignore the line wrap)
trying to match the following data:
ENVIRONMENTAL RESTORATION AND WASTE MANAGEMENT, DOE (US)
The following code will simply hang for a long time then write the
message "Requested Service not found" to the debug console:
Regex myRegex = new Regex(@"^(?<OrgCity>([A-Z][\w ]+)+), City of, (?
<OrgState>(([A-Z][A-Z])|([A-Z][a-z]+\.)))( \((?<OrgCountry>[\w ]{2,})
\))?$", RegexOptions.Compiled);
myRegex.IsMatch("ENVIRONMENTAL RESTORATION AND WASTE MANAGEMENT, DOE
(US)");- Hide quoted text -

- Show quoted text -
I think you missed the point. My post was not for help on how to match
some pattern. It's about why the regex library has this unpredictable
behavior.

I actually intended for the pattern to NOT match that string. I see
your point about having unneccessary capturing groups though, but
there not a problem for what I'm trying to capture.

May 18 '07 #3

P: n/a
I wish you had told us that you weren't looking for a solution before I
tried to solve your problem! :P

Oh well. The result of the groupings was probably the cause, as it created a
large number of capturing groups (7), some of which were nested inside
others. I'm thinking that the combination of nested groups and unintentional
self-backreferences caused some sort of recursion overflow, but that's just
a guess.

--
HTH,

Kevin Spencer
Microsoft MVP

Printing Components, Email Components,
FTP Client Classes, Enhanced Data Controls, much more.
DSI PrintManager, Miradyne Component Libraries:
http://www.miradyne.net

<sh**********@comcast.netwrote in message
news:11**********************@u30g2000hsc.googlegr oups.com...
On May 18, 12:50 pm, "Kevin Spencer" <unclechut...@nothinks.com>
wrote:
>Well, your regular expression is a mess for starters. I would suggest an
alternative, but you haven't given us any rules regarding the pattern(s)
you're trying to match. What you said was:
trying to match the following data:
ENVIRONMENTAL RESTORATION AND WASTE MANAGEMENT, DOE (US)

That is obviously not true. You wouldn't need a regular expression to
match
a single fixed string. A regular expression searches for patterns in
strings. Those patterns are defined by rules that are expressed in the
regular expression. And the regular expression you posted, besides being
a
mess (I will get to that), couldn't possibly match that string, since it
contains the literal ", City of, " - which is nowhere to be found in your
posted string.

The reason it's a mess is that you have many more Groups than you
probably
know. You have 3 named Groups ("OrgCity," "OrgState," and "OrgCountry"),
but
you also have FIVE unnamed Groups, and you're using backreferencing, so
I'm
not sure where the compiler is throwing up on you.

Because I don't know the rules, I can't really give you a full answer.
However, I can tell you this much:

"^(?<OrgCity>[\w\s]+),"

will capture the following:

ENVIRONMENTAL RESTORATION AND WASTE MANAGEMENT,

Everything but the comma will be in the Group "OrgCity"

"\((?<OrgCountry>[\w\s]{2,})\)?$"

will capture the following:

(US)

Everything but the parentheses will be in the group "OrgCountry"

As for your third Group, I simplified the regular expression to the
following, which has the same rules:

(?<OrgState>[A-Z]{2}|[A-Z][a-z]+\.)

Briefly, it captures 1 of 2 possible patterns:
2 Capital letters
-or-
1 Capital letter followed by 1 or more lower-case letters, followed
by a
period

That's the best I can do!

--
HTH,

Kevin Spencer
Microsoft MVP

Printing Components, Email Components,
FTP Client Classes, Enhanced Data Controls, much more.
DSI PrintManager, Miradyne Component Libraries:http://www.miradyne.net

<shawnmkra...@comcast.netwrote in message

news:11**********************@n59g2000hsh.googleg roups.com...
Anyone every heard of the Regex.IsMatch and Regex.Match methods just
hanging and eventually getting a message "Requested Service not
found"?
I have the following pattern:
^(?<OrgCity>([A-Z][\w ]+)+), City of, (?<OrgState>(([A-Z][A-Z])|([A-Z]
[a-z]+\.)))( \((?<OrgCountry>[\w ]{2,})\))?$
(ignore the line wrap)
trying to match the following data:
ENVIRONMENTAL RESTORATION AND WASTE MANAGEMENT, DOE (US)
The following code will simply hang for a long time then write the
message "Requested Service not found" to the debug console:
Regex myRegex = new Regex(@"^(?<OrgCity>([A-Z][\w ]+)+), City of, (?
<OrgState>(([A-Z][A-Z])|([A-Z][a-z]+\.)))( \((?<OrgCountry>[\w ]{2,})
\))?$", RegexOptions.Compiled);
myRegex.IsMatch("ENVIRONMENTAL RESTORATION AND WASTE MANAGEMENT, DOE
(US)");- Hide quoted text -

- Show quoted text -

I think you missed the point. My post was not for help on how to match
some pattern. It's about why the regex library has this unpredictable
behavior.

I actually intended for the pattern to NOT match that string. I see
your point about having unneccessary capturing groups though, but
there not a problem for what I'm trying to capture.

May 18 '07 #4

P: n/a
* sh**********@comcast.net wrote, On 18-5-2007 16:44:
Anyone every heard of the Regex.IsMatch and Regex.Match methods just
hanging and eventually getting a message "Requested Service not
found"?

I have the following pattern:

^(?<OrgCity>([A-Z][\w ]+)+), City of, (?<OrgState>(([A-Z][A-Z])|([A-Z]
[a-z]+\.)))( \((?<OrgCountry>[\w ]{2,})\))?$

(ignore the line wrap)

trying to match the following data:

ENVIRONMENTAL RESTORATION AND WASTE MANAGEMENT, DOE (US)

The following code will simply hang for a long time then write the
message "Requested Service not found" to the debug console:

Regex myRegex = new Regex(@"^(?<OrgCity>([A-Z][\w ]+)+), City of, (?
<OrgState>(([A-Z][A-Z])|([A-Z][a-z]+\.)))( \((?<OrgCountry>[\w ]{2,})
\))?$", RegexOptions.Compiled);

myRegex.IsMatch("ENVIRONMENTAL RESTORATION AND WASTE MANAGEMENT, DOE
(US)");
This regex caused my vista installation to bluescreen for the very first
time since its installation in November. Congratulations on that :).

The cause:
(?<OrgCity>([A-Z][\w ]+)+)

Allows for an enormous amount of backtracking. A slightly improved variant:

(?<OrgCity>([A-Z][\w]+ )+)

Actually gives speedy and stable results. (notice how I removed the
space from the inner repetition).

Though a bluescreen should never have been caused of course.

Jesse
May 18 '07 #5

P: n/a
* sh**********@comcast.net wrote, On 18-5-2007 16:44:
Anyone every heard of the Regex.IsMatch and Regex.Match methods just
hanging and eventually getting a message "Requested Service not
found"?

I have the following pattern:

^(?<OrgCity>([A-Z][\w ]+)+), City of, (?<OrgState>(([A-Z][A-Z])|([A-Z]
[a-z]+\.)))( \((?<OrgCountry>[\w ]{2,})\))?$

(ignore the line wrap)

trying to match the following data:

ENVIRONMENTAL RESTORATION AND WASTE MANAGEMENT, DOE (US)

The following code will simply hang for a long time then write the
message "Requested Service not found" to the debug console:

Regex myRegex = new Regex(@"^(?<OrgCity>([A-Z][\w ]+)+), City of, (?
<OrgState>(([A-Z][A-Z])|([A-Z][a-z]+\.)))( \((?<OrgCountry>[\w ]{2,})
\))?$", RegexOptions.Compiled);

myRegex.IsMatch("ENVIRONMENTAL RESTORATION AND WASTE MANAGEMENT, DOE
(US)");
I submitted a bug to the Framework bugtracker on connect.

Please vote for it here:
https://connect.microsoft.com/Visual...dbackID=277745

Jesse
May 19 '07 #6

This discussion thread is closed

Replies have been disabled for this discussion.