By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
454,442 Members | 1,437 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 454,442 IT Pros & Developers. It's quick & easy.

help extracting tag with boost:regex

P: n/a
MCH
hi there,
I am working with a HTML-like text with boost:regex. For example,
the following pattern might occur in my text

<abc efg>
[TAG] <p>EFG</p[/TAG] 12<3>

In this case, I would like to extract everything between [TAG] [/TAG]
and replace [TAG] with <pre>, [/TAG] with </pre>. Meanwhile,
everything outside [TAG][/TAG] should be unchaged except that < is
replaced by &lt; and is replaced by &gt;

In far more complicated case, a nested [TAG] might occur as follow

<abc efg>
[TAG] <p>EF [TAG] eee [/TAG] G</p[/TAG] 12<3>

in this case, the program just tackle the outermost TAG and left the
inside TAG there. I try to implement the program with boost::regex;
however, it seems never succeed in extracing the TAG even for the
simple case.

Mar 12 '07 #1
Share this Question
Share on Google+
3 Replies


P: n/a
On 12 Mar 2007 13:07:26 -0700 in comp.lang.c++, "MCH" <gm****@21cn.com>
wrote,
><abc efg>
[TAG] <p>EFG</p[/TAG] 12<3>

In this case, I would like to extract everything between [TAG] [/TAG]
and replace [TAG] with <pre>, [/TAG] with </pre>. Meanwhile,
everything outside [TAG][/TAG] should be unchaged except that < is
replaced by &lt; and is replaced by &gt;
This exceeds what I know how to do with regex. Too much context.
I would do it with std::string, std::find_first_of(),
and a couple of if statements.

Maybe you could pretend you are writing a Perl program and get some help
from the real regex experts over on comp.lang.perl.misc

Mar 13 '07 #2

P: n/a
MCH wrote:
hi there,
I am working with a HTML-like text with boost:regex. For example,
the following pattern might occur in my text

<abc efg>
[TAG] <p>EFG</p[/TAG] 12<3>

In this case, I would like to extract everything between [TAG] [/TAG]
and replace [TAG] with <pre>, [/TAG] with </pre>. Meanwhile,
everything outside [TAG][/TAG] should be unchaged except that < is
replaced by &lt; and is replaced by &gt;

In far more complicated case, a nested [TAG] might occur as follow

<abc efg>
[TAG] <p>EF [TAG] eee [/TAG] G</p[/TAG] 12<3>

in this case, the program just tackle the outermost TAG and left the
inside TAG there. I try to implement the program with boost::regex;
however, it seems never succeed in extracing the TAG even for the
simple case.
try
const boost::regex expression("\[TAG\].*?\[\/TAG\]");

The trick is to use the '?' after .* to turn off greedy pattern match,
instead of matching the last occurance of [/TAG], it will match the
first [/TAG] which may or may not be what you want. It seems your
problem is not so much as boost::regex but utilizing regular expression
pattern match in general. I would recommend you to consult regular
expression documents first and experiment with simpler string pattern
with boost::regex.

Fei
Mar 14 '07 #3

P: n/a
Fei Liu wrote:
MCH wrote:
>hi there,
I am working with a HTML-like text with boost:regex. For example,
the following pattern might occur in my text

<abc efg>
[TAG] <p>EFG</p[/TAG] 12<3>

In this case, I would like to extract everything between [TAG] [/TAG]
and replace [TAG] with <pre>, [/TAG] with </pre>. Meanwhile,
everything outside [TAG][/TAG] should be unchaged except that < is
replaced by &lt; and is replaced by &gt;

In far more complicated case, a nested [TAG] might occur as follow

<abc efg>
[TAG] <p>EF [TAG] eee [/TAG] G</p[/TAG] 12<3>

in this case, the program just tackle the outermost TAG and left the
inside TAG there. I try to implement the program with boost::regex;
however, it seems never succeed in extracing the TAG even for the
simple case.
try
const boost::regex expression("\[TAG\].*?\[\/TAG\]");

The trick is to use the '?' after .* to turn off greedy pattern match,
instead of matching the last occurance of [/TAG], it will match the
first [/TAG] which may or may not be what you want.
It's definitely not what's needed, since it will match the [TAG] of
the outer block to the [/TAG] of the inner block. See the "far more
complicated case" above.
It seems your
problem is not so much as boost::regex but utilizing regular expression
pattern match in general. I would recommend you to consult regular
expression documents first and experiment with simpler string pattern
with boost::regex.
Right. Regular expressions don't deal well with recursive patterns.

--

-- Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com)
Author of "The Standard C++ Library Extensions: a Tutorial and
Reference." (www.petebecker.com/tr1book)
Mar 20 '07 #4

This discussion thread is closed

Replies have been disabled for this discussion.