Hi Mr. Nobody,
Actually, Regex is quite capable of handling this sort of situation. In the
solution I gave you I went for the simplest solution necessary, as I
understood it. Your second example was using an image tag, which would not
contain other tags. In a case where other tags might be nested, you would
need to use a different set of Regex tools.
For example, to get all text between 2 matching beginning and ending tags,
when there are no nested tags, you would use something like:
<([^>]*)>([^<]*)</\1>
This indicates that a match begins with the left angle bracket. The left
angle bracket is followed by a sequence of any length of characters that are
NOT a right angle bracket. This sequence of characters is put into Group 1.
This is followed by a sequence of any length that is NOT a left angle
bracket, followed by a left angel bracket and a forward-slash. The last part
of the match is that the text from the first tag (Group 1) is matched,
followed by a right angle bracket.
For tags that contain nested tags, something like the following might work:
<(table|form|div)[^>]*>(.*?)</\1>
This indicates that tables, forms, and divs (I'm sure I may have missed one
or two) are matched. The ending tag uses the group captured from the first
tag. Group 2 contains the content.
--
HTH,
Kevin Spencer
Microsoft MVP
Chicken Salad Surgery
It takes a tough man to make a tender chicken salad.
"MrNobody" <Mr******@discussions.microsoft.comwrote in message
news:6A**********************************@microsof t.com...
Kevin, thanks for that tip, it works great for that example!
So there is no way in regex to say something like, accept all characters
until you hit a specific group of characters, like "</div>" ? Like let's
say
you are scanning a web page for a specific opening <divtag, and you want
to
grab all the text between that and the next closing </divtag, so the
contents may include many <'s and >'s inside. I guess regex is not the way
to
go for doing something like that?