By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
432,046 Members | 2,105 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 432,046 IT Pros & Developers. It's quick & easy.

Regular expressions

P: n/a
Hi guys,

I need some help regarding regular expressions. Consider the following
statement :

System.Text.RegularExpressions.Match match =
System.Text.RegularExpressions.Regex.Match(request Path, "([^/]*?\
\.ashx)");

(where requestPath is a string)

What does the regex: [^/]*?\\.ashx actually do ? How come * and ?
occur consecutively ?
Doesn't '?' require some text/block of text before it ?
Also, is the expression read left to right or right to left ?
i.e. is the backslash grouped as '\\'. or \' \ .' ? If it is the
former, why is it not written as \\\. and if latter what does the
orphaned backslash do ?

Hope that's not too many questions - I'm too confused !

Thanks !

Mar 7 '07 #1
Share this Question
Share on Google+
3 Replies


P: n/a
Oops, I guess that should go to the CSharp forum, but do let me know
if you can help me.

Thanks !

On Mar 7, 5:34 pm, "Zeba" <coolz...@gmail.comwrote:
Hi guys,

I need some help regarding regular expressions. Consider the following
statement :

System.Text.RegularExpressions.Match match =
System.Text.RegularExpressions.Regex.Match(request Path, "([^/]*?\
\.ashx)");

(where requestPath is a string)

What does the regex: [^/]*?\\.ashx actually do ? How come * and ?
occur consecutively ?
Doesn't '?' require some text/block of text before it ?
Also, is the expression read left to right or right to left ?
i.e. is the backslash grouped as '\\'. or \' \ .' ? If it is the
former, why is it not written as \\\. and if latter what does the
orphaned backslash do ?

Hope that's not too many questions - I'm too confused !

Thanks !

Mar 7 '07 #2

P: n/a
Regular Expressions are a powerful way to match patterns of characters in
strings.

The Regular Expression engine is basically procedural in nature, examining a
string one character at a time, but although it moves from left to right
through the string, it has the capability to move (jump) backwards as well,
and to keep track of multiple matches, groups, and so on.

What it does is to use a syntax that identifies sequences of characters in a
string. In your example,

[^/]*?

is essentially what is called a "character class." A character class is a
set of matching characters which can appear in any order, and a match can
contain any of the characters. The characters in the set are identified by
the [square brackets] surrounding them. The character '^' indicates a "NOT"
grouping, which means that a match may NOT contain any of the characters in
the set. The '/' character is the only character in this particular set.

The character following the character class is a quantifier. It indicates
how many characters in the set constitute a match. The '*' character
signifies "zero or more." Some other quantifiers are '+' (one ore more), '?'
(zero or one), and sets of numbers in curly brackets, for example: {2}
(exactly 2), {1,5} (between 1 and 5 inclusive).

The '?' following the '*' in this case is NOT a quantifier. It is determined
by its' context in the pattern. If it immediately followed the character
class it would be a quantifier, but because it follows the quantifier, it
modifies the quantifier. It indicates that the character set is "lazy" as
opposed to "greedy." This is a little harder to explain. Regular Expressions
are "greedy" by default. That is, if a string contains a continuous set of
characters that constitute a match, followed by one or more continuous
characters that constitute a match, the matches are combined into a single
match, for as many times as there are sets of continuous matching
characters.

For example, if you are looking for an HTML tag in a document, you might
think the following would work:

<.+(a left angle bracket, followed by any non-line-break character one or
more times, followed by a right angle bracket)

If you were looking at the following HTML:

<a href="blah">Click Here</a>

You might think that it would capture the opening tag. However, it would
capture the entire string. Why? Because the right angle-bracket in the
opening tag is not a line-break character. Yes, the match MUST end in a
right angle bracket. However, since RegEx is greedy, it will continue until
it finds a character that does NOT match the expression.

If you were to use the following instead:

<.+?>

It would stop at the first right-angle bracket. This is because the '?'
means that as few non-line-break characters as possible should match before
the right angle bracket.

You could also do the following:

<[^>]+>

This means that any right angle bracket character can not be part of the
match prior to the right angle bracket at the end of the match.

Here's a good reference on using Regular Expressions with the .Net platform:

http://msdn2.microsoft.com/en-us/library/hs600312.aspx

--
HTH,

Kevin Spencer
Microsoft MVP

Help test our new betas,
DSI PrintManager, Miradyne Component Libraries:
http://www.miradyne.net

"Zeba" <co******@gmail.comwrote in message
news:11**********************@t69g2000cwt.googlegr oups.com...
Hi guys,

I need some help regarding regular expressions. Consider the following
statement :

System.Text.RegularExpressions.Match match =
System.Text.RegularExpressions.Regex.Match(request Path, "([^/]*?\
\.ashx)");

(where requestPath is a string)

What does the regex: [^/]*?\\.ashx actually do ? How come * and ?
occur consecutively ?
Doesn't '?' require some text/block of text before it ?
Also, is the expression read left to right or right to left ?
i.e. is the backslash grouped as '\\'. or \' \ .' ? If it is the
former, why is it not written as \\\. and if latter what does the
orphaned backslash do ?

Hope that's not too many questions - I'm too confused !

Thanks !

Mar 7 '07 #3

P: n/a
Thanks ! That was very helpful.

On Mar 7, 6:17 pm, "Kevin Spencer" <unclechut...@nothinks.comwrote:
Regular Expressions are a powerful way to match patterns of characters in
strings.

The Regular Expression engine is basically procedural in nature, examining a
string one character at a time, but although it moves from left to right
through the string, it has the capability to move (jump) backwards as well,
and to keep track of multiple matches, groups, and so on.

What it does is to use a syntax that identifies sequences of characters in a
string. In your example,

[^/]*?

is essentially what is called a "character class." A character class is a
set of matching characters which can appear in any order, and a match can
contain any of the characters. The characters in the set are identified by
the [square brackets] surrounding them. The character '^' indicates a "NOT"
grouping, which means that a match may NOT contain any of the characters in
the set. The '/' character is the only character in this particular set.

The character following the character class is a quantifier. It indicates
how many characters in the set constitute a match. The '*' character
signifies "zero or more." Some other quantifiers are '+' (one ore more), '?'
(zero or one), and sets of numbers in curly brackets, for example: {2}
(exactly 2), {1,5} (between 1 and 5 inclusive).

The '?' following the '*' in this case is NOT a quantifier. It is determined
by its' context in the pattern. If it immediately followed the character
class it would be a quantifier, but because it follows the quantifier, it
modifies the quantifier. It indicates that the character set is "lazy" as
opposed to "greedy." This is a little harder to explain. Regular Expressions
are "greedy" by default. That is, if a string contains a continuous set of
characters that constitute a match, followed by one or more continuous
characters that constitute a match, the matches are combined into a single
match, for as many times as there are sets of continuous matching
characters.

For example, if you are looking for an HTML tag in a document, you might
think the following would work:

<.+(a left angle bracket, followed by any non-line-break character one or
more times, followed by a right angle bracket)

If you were looking at the following HTML:

<a href="blah">Click Here</a>

You might think that it would capture the opening tag. However, it would
capture the entire string. Why? Because the right angle-bracket in the
opening tag is not a line-break character. Yes, the match MUST end in a
right angle bracket. However, since RegEx is greedy, it will continue until
it finds a character that does NOT match the expression.

If you were to use the following instead:

<.+?>

It would stop at the first right-angle bracket. This is because the '?'
means that as few non-line-break characters as possible should match before
the right angle bracket.

You could also do the following:

<[^>]+>

This means that any right angle bracket character can not be part of the
match prior to the right angle bracket at the end of the match.

Here's a good reference on using Regular Expressions with the .Net platform:

http://msdn2.microsoft.com/en-us/library/hs600312.aspx

--
HTH,

Kevin Spencer
Microsoft MVP

Help test our new betas,
DSI PrintManager, Miradyne Component Libraries:http://www.miradyne.net

"Zeba" <coolz...@gmail.comwrote in message

news:11**********************@t69g2000cwt.googlegr oups.com...
Hi guys,
I need some help regarding regular expressions. Consider the following
statement :
System.Text.RegularExpressions.Match match =
System.Text.RegularExpressions.Regex.Match(request Path, "([^/]*?\
\.ashx)");
(where requestPath is a string)
What does the regex: [^/]*?\\.ashx actually do ? How come * and ?
occur consecutively ?
Doesn't '?' require some text/block of text before it ?
Also, is the expression read left to right or right to left ?
i.e. is the backslash grouped as '\\'. or \' \ .' ? If it is the
former, why is it not written as \\\. and if latter what does the
orphaned backslash do ?
Hope that's not too many questions - I'm too confused !
Thanks !

Mar 8 '07 #4

This discussion thread is closed

Replies have been disabled for this discussion.