469,904 Members | 2,101 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,904 developers. It's quick & easy.

Regex Help Required

hello,

I've got the following regexpression:

"<span .*bbc_underline.*>(.*)</span>"

and the following input string:

"this <span class="bbc_underline">is underlined <span
class="bbc_strikethrough">and striked through</span>text.</span>"

when I do a replace on the expression, I get the following result:

"this text."

I'm quite new to regular expressions. the goal in the expression above is to
yield the following result:

"this [u]is underlined [s]and striked through[/s]test.[u]"

but somehow, the first expression eats up more then it should. btw, I don't
know if or what is nested. I just have a couple of span's with given
classes, that will "turn into" square brackets. can somebody please help? A
ton of thanks in advance.

Matthias
Mar 5 '07 #1
4 1103
Regex is "greedy" by default. It will always find the largest chunk it can.
To make it "lazy" (find the smallest chunk it can), use "*?".
So, I think your Regex would be
"<span .*?bbc_underline.*?>(.*?)</span>"
Each of the ".*?" will find the minimum number of characters rather than the
maximum.
Give it a try.
Ethan
"Matthias S." <matthias&AtSign&emvoid$dot$comwrote in message
news:e2**************@TK2MSFTNGP05.phx.gbl...
hello,

I've got the following regexpression:

"<span .*bbc_underline.*>(.*)</span>"

and the following input string:

"this <span class="bbc_underline">is underlined <span
class="bbc_strikethrough">and striked through</span>text.</span>"

when I do a replace on the expression, I get the following result:

"this text."

I'm quite new to regular expressions. the goal in the expression above is
to
yield the following result:

"this [u]is underlined [s]and striked through[/s]test.[u]"

but somehow, the first expression eats up more then it should. btw, I
don't
know if or what is nested. I just have a couple of span's with given
classes, that will "turn into" square brackets. can somebody please help?
A
ton of thanks in advance.

Matthias


Mar 5 '07 #2
hey ethan,

thanks for your help, but it does not work correctly either, since it
matches the first occurence of </spaninstead of the last.

In the case I did not explain it correctly: What I want to achive is to
match the first occurence of <span SOMETHING "bbc_underline"with the last
possible occurence of </spanand replace everything within with $1.
I don't know how many nestings there are. The span classes I use come from a
predefined list (bbc_underline, bbc_italic, bbc_strikethrough and the like)

I would greatly appreceate further help.

Matthias

"Ethan Strauss" <ethan dot strauss at Promega dot comschrieb im
Newsbeitrag news:OD**************@TK2MSFTNGP03.phx.gbl...
Regex is "greedy" by default. It will always find the largest chunk it
can.
To make it "lazy" (find the smallest chunk it can), use "*?".
So, I think your Regex would be
"<span .*?bbc_underline.*?>(.*?)</span>"
Each of the ".*?" will find the minimum number of characters rather than
the
maximum.
Give it a try.
Ethan
"Matthias S." <matthias&AtSign&emvoid$dot$comwrote in message
news:e2**************@TK2MSFTNGP05.phx.gbl...
hello,

I've got the following regexpression:

"<span .*bbc_underline.*>(.*)</span>"

and the following input string:

"this <span class="bbc_underline">is underlined <span
class="bbc_strikethrough">and striked through</span>text.</span>"

when I do a replace on the expression, I get the following result:

"this text."

I'm quite new to regular expressions. the goal in the expression above
is
to
yield the following result:

"this [u]is underlined [s]and striked through[/s]test.[u]"

but somehow, the first expression eats up more then it should. btw, I
don't
know if or what is nested. I just have a couple of span's with given
classes, that will "turn into" square brackets. can somebody please
help?
A
ton of thanks in advance.

Matthias


Mar 5 '07 #3
The only way of replacing nested tags is to start from the innermost tag.

Make a pattern that matches your tag, but only if there is not another
such tag inside it. Then you can successfully match the innermost tag
and replace it, then repeat the process until there are no more matches.

Matthias S. wrote:
hey ethan,

thanks for your help, but it does not work correctly either, since it
matches the first occurence of </spaninstead of the last.

In the case I did not explain it correctly: What I want to achive is to
match the first occurence of <span SOMETHING "bbc_underline"with the last
possible occurence of </spanand replace everything within with $1.
I don't know how many nestings there are. The span classes I use come from a
predefined list (bbc_underline, bbc_italic, bbc_strikethrough and the like)

I would greatly appreceate further help.

Matthias

"Ethan Strauss" <ethan dot strauss at Promega dot comschrieb im
Newsbeitrag news:OD**************@TK2MSFTNGP03.phx.gbl...
>Regex is "greedy" by default. It will always find the largest chunk it
can.
>To make it "lazy" (find the smallest chunk it can), use "*?".
So, I think your Regex would be
"<span .*?bbc_underline.*?>(.*?)</span>"
Each of the ".*?" will find the minimum number of characters rather than
the
>maximum.
Give it a try.
Ethan
"Matthias S." <matthias&AtSign&emvoid$dot$comwrote in message
news:e2**************@TK2MSFTNGP05.phx.gbl...
>>hello,

I've got the following regexpression:

"<span .*bbc_underline.*>(.*)</span>"

and the following input string:

"this <span class="bbc_underline">is underlined <span
class="bbc_strikethrough">and striked through</span>text.</span>"

when I do a replace on the expression, I get the following result:

"this text."

I'm quite new to regular expressions. the goal in the expression above
is
>>to
yield the following result:

"this [u]is underlined [s]and striked through[/s]test.[u]"

but somehow, the first expression eats up more then it should. btw, I
don't
know if or what is nested. I just have a couple of span's with given
classes, that will "turn into" square brackets. can somebody please
help?
>>A
ton of thanks in advance.

Matthias



--
Göran Andersson
_____
http://www.guffa.com
Mar 5 '07 #4
Hello Matthias,
>thanks for your help, but it does not work correctly either, since it
matches the first occurence of </spaninstead of the last.

In the case I did not explain it correctly: What I want to achive is to
match the first occurence of <span SOMETHING "bbc_underline"with the last
possible occurence of </spanand replace everything within with $1.
If I understand you correctly, you want balanced matching. That is
actually possible in .NET - here's a post that explains how:
http://blogs.msdn.com/bclteam/archiv...15/396452.aspx

Let me know if that's what you need but you can't make it work - I don't
want to work on it unless it really helps you :-)
Oliver Sturm
--
http://www.sturmnet.org/blog
Mar 5 '07 #5

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

7 posts views Thread by William Stacey [MVP] | last post: by
reply views Thread by Tidane | last post: by
3 posts views Thread by Masa Ito | last post: by
15 posts views Thread by morleyc | last post: by
7 posts views Thread by Nightcrawler | last post: by
1 post views Thread by billy.murray | last post: by
1 post views Thread by Waqarahmed | last post: by
reply views Thread by Salome Sato | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.