Hi Craig,
I am trying to extract the '' part of it.
Part of your question remains unclear to me, so I can only work with
assumptions :
1. Which HTML sample do you want to match ? The first or the second or
both ? I will assume both.
2. What part do you want to extract ? I think you missed out that part.From your Regex, you apparently want to match the part within the
<b>...</b> tags. That is what I will assume.
Here is a Regex to match : (Turn on "Dot matches Newline mode" for it
to work)
Property\s+ID:\s*</span></td>\s*<td.*?><b>(?:[ ]*)(.*)</b></td>
Points to note :
----------------------
1. Instead of \s\ * , I have used : \s+, in cases where there will be
atleast one space, and \s* where there might be zero or more spaces.
This can be changed to \s* in all cases. This matches all spaces, tabs
and line breaks.
2. If your lines break in an unanticipated position, the Regex will not
match.
3. In order to match zero or more special entities, I have used
(?:[ ]*). This will not store the entity in a backreference.
4. If you're using .NET, you can turn on "Dot matches newline" mode
using the RegexOptions.SingleLine option.
5. Regexes can only match very specific strings. Usually you can relax
it a bit for spaces and line breaks, but not for other characters. For
instance, if an is inserted anywhere else in the string, except
for within the <b>...</b> tags, the Regex will not match. So, if you're
expecting very diverse HTML fragments, you would be better off with
Larry's suggestion of using HTMLAgilityPack. It can be downloaded from
:
http://www.codefluent.com/smourier/d...gilitypack.zip
HTH,
Regards,
Cerebrus.