"Barry Mossman" <BM************@gmail.com> wrote in
news:ew**************@TK2MSFTNGP11.phx.gbl:
"Chris R. Timmons" <crtimmons@X_NOSPAM_Xcrtimmonsinc.com> wrote
in message
news:Xn**********************************@207.46.2 48.16... Match m = Regex.Match(htmlText,
@"<\s*?span.*?class\s*?=\s*?""serif"".*?[^>]*?>(?<contents>.*?)<
/\s*?span\s*?>",
RegexOptions.Singleline |
RegexOptions.IgnoreCase);
return m.Groups["contents"].ToString();
Hi Chris,
I understand the string excepting for the "*?"s What is the
impact of the "?" ?
How does the behaviour differ from
@"<\s*span.*class\s*=\s*""serif"".*[^>]*>(?<contents>.*)</\s*span
\s*>",
Barry,
Normally, quantifiers like * and + are "greedy", which means the
regex will match as many characters as possible. The ? makes the
regex "non-greedy", so the expression will only match the minimum
amount of characters necessary. (Note that in the regex I posted,
the \s*? is equivalent to \s*. I kind of went overboard with the ?s
:-) ).
For example, assume the following input:
<text>first</text><text>second</text>
Let's say I want to extract the text between the first set of <text>
tags. Using this greedy expression:
<text>(.*)</text>
would return this text:
first</text><text>second
See how the .* matched as much as possible?
If the regex is changed to:
<text>(.*?)</text>
then only the minimum amount of text is matched:
first
--
Hope this helps.
Chris.
-------------
C.R. Timmons Consulting, Inc.
http://www.crtimmonsinc.com/