By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
459,272 Members | 1,437 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 459,272 IT Pros & Developers. It's quick & easy.

Regular expression to match a nested quoted string

P: n/a
a
I need to write a regular expression to match a quoted string in which the
double quote character itself is represented by 2 double quotes. For
example:

"beginning ""nested quoted string"" end"

Any idea how to write this in boost::xpressive or boost::regex.

Thanks,

A
Sep 9 '06 #1
Share this Question
Share on Google+
9 Replies


P: n/a
"a" <xx*****@pacbell.netwrote in message
news:X2*******************@newssvr11.news.prodigy. com...
>I need to write a regular expression to match a quoted string in which the
double quote character itself is represented by 2 double quotes. For
example:

"beginning ""nested quoted string"" end"

Any idea how to write this in boost::xpressive or boost::regex.

Thanks,

A
I'm not familiar with boost, but in C++ you need to escape the " like \"
Try
"beginning \"nested quoted string\" end"
Sep 9 '06 #2

P: n/a
a wrote:
I need to write a regular expression to match a quoted string in which the
double quote character itself is represented by 2 double quotes. For
example:

"beginning ""nested quoted string"" end"
Quotes aren't special characters in any of the regular expression
grammars supported by TR1, so the regular expression is

""(.*)""

To write that as a C++ text string, add escapes to the quotes:

char *rx = "\"\"(.*)\"\"";

That is, unless you're using the basic or grep grammars. In those cases
you need \( instead of (, so the regular expression is

""\(.*\)""

and the corresponding literal constant has more escapes:

char *rx = "\"\"\\(.*\\)\"\"";

--

-- Pete

Author of "The Standard C++ Library Extensions: a Tutorial and Reference."
For more information about this book, see www.petebecker.com/tr1book.
Sep 9 '06 #3

P: n/a
a
Thanks for your answers.

I'll clarify my question.

The issue for me is not how to escape quote characters in a C quoted string,
but rather what regular expression to use to match quoted strings that
contain other nested quoted strings

The strings are coming from an external text file in Microsoft rc file
format used to describe resources used by an application. Here is a sample:

IDD_ADD_DUPLICATE
......
CAPTION "Duplicate Entry Found"
......
LTEXT "You are trying to add the entry""%s <%s>"" to your
list, but ""%s <%s>"" already exists.\n\n\nWould you like to merge these two
entries instead of creating a new one?",
.......
END

My expression must match the first string ("Duplicate Entry Found" ) as well
as the _entire_ string after LTEXT:

"You are trying to add the entry""%s <%s>"" to your list, but ""%s <%s>""
already exists.\n\n\nWould you like to merge these two entries instead of
creating a new one?"

If I use a simple regex for a quoted string, it will stop at the first " and
it will match only: "You are trying to add the entry".

So the question is how to make the expression recognize that one '"' is the
end of the string, but '""' is part of the string.

I tried using static regexes in boost::xpressive, but they generate runtime
stack overflows when I add patterns for the inner "" characters, so I'm
assuming the expressions are not correct.

Thanks,

A
Sep 9 '06 #4

P: n/a
On Sat, 09 Sep 2006 06:58:37 -0400 in comp.lang.c++, Pete Becker
<pe********@acm.orgwrote,
>a wrote:
>I need to write a regular expression to match a quoted string in which the
double quote character itself is represented by 2 double quotes. For
example:

"beginning ""nested quoted string"" end"

Quotes aren't special characters in any of the regular expression
grammars supported by TR1, so the regular expression is

""(.*)""
Close, but no cigar. That will match a string that has quotes at
the begin and end but the quotes inside (if any) might not be
doubled.

If a quoted string "A" is matched simply by /"[^"]*"/
or in c++, "\"[^\"]*\""

A string with possible double quotes in it "A""A" is just a series
of the above. So it is /("[^"]*")*/, or in c++ "(\"[^\"]*\")*".

#include <string>
#include <iostream>

#define BOOST_REGEX_NO_FILEITER
#include <boost/regex.hpp>

using namespace std;
using namespace boost;

bool check (std::string what)
{
static const boost::regex expr("(\"[^\"]*\")*" );
boost::smatch tokens;
bool result = regex_match(what, tokens, expr);
cout << result << "\t(" << what << ")\n";
return result;
}

int main(int argc, char* argv[])
{
check("\"I am a duck\"");
check("\"I am a \"\" duck\"");
check("\"I \"\"am\"\" a \"\" duck\"");
check("I am not a duck");
check("\"I am \"not\" a duck\"");
check("I am \"not\" a duck");
check("\"I am not \" a duck\"");
}

Sep 9 '06 #5

P: n/a
David Harmon wrote:
On Sat, 09 Sep 2006 06:58:37 -0400 in comp.lang.c++, Pete Becker
<pe********@acm.orgwrote,
>>a wrote:
>>>I need to write a regular expression to match a quoted string in which the
double quote character itself is represented by 2 double quotes. For
example:

"beginning ""nested quoted string"" end"

Quotes aren't special characters in any of the regular expression
grammars supported by TR1, so the regular expression is

""(.*)""


Close, but no cigar. That will match a string that has quotes at
the begin and end but the quotes inside (if any) might not be
doubled.
You're reading far too much into a vague specification. The regular
expression I gave matches the nested quoted string in the example.

--

-- Pete

Author of "The Standard C++ Library Extensions: a Tutorial and Reference."
For more information about this book, see www.petebecker.com/tr1book.
Sep 9 '06 #6

P: n/a
a
Actually David's regex does the job. It's my fault that my initial specs
weren't too clear - see my previous reply to your post that better describes
the problem.

Thanks,

A

"Pete Becker" <pe********@acm.orgwrote in message
news:VN******************************@giganews.com ...
David Harmon wrote:
>On Sat, 09 Sep 2006 06:58:37 -0400 in comp.lang.c++, Pete Becker
<pe********@acm.orgwrote,
>>>a wrote:

I need to write a regular expression to match a quoted string in which
the
double quote character itself is represented by 2 double quotes. For
example:

"beginning ""nested quoted string"" end"
Quotes aren't special characters in any of the regular expression
grammars supported by TR1, so the regular expression is

""(.*)""


Close, but no cigar. That will match a string that has quotes at
the begin and end but the quotes inside (if any) might not be
doubled.

You're reading far too much into a vague specification. The regular
expression I gave matches the nested quoted string in the example.

--

-- Pete

Author of "The Standard C++ Library Extensions: a Tutorial and Reference."
For more information about this book, see www.petebecker.com/tr1book.

Sep 9 '06 #7

P: n/a
a wrote:
>
IDD_ADD_DUPLICATE
.....
CAPTION "Duplicate Entry Found"
.....
LTEXT "You are trying to add the entry""%s <%s>"" to your
list, but ""%s <%s>"" already exists.\n\n\nWould you like to merge these two
entries instead of creating a new one?",
......
END

My expression must match the first string ("Duplicate Entry Found" ) as well
as the _entire_ string after LTEXT:
Okay, with more context, it looks like David Harmon's guess was right.

--

-- Pete

Author of "The Standard C++ Library Extensions: a Tutorial and Reference."
For more information about this book, see www.petebecker.com/tr1book.
Sep 9 '06 #8

P: n/a
a
Thanks! This works.

Now I only have to be able to mark the content of the outermost string
(without the enclosing quotes) and I'm done.

Thanks,

A
"David Harmon" <so****@netcom.comwrote in message
news:45****************@news.west.earthlink.net...
On Sat, 09 Sep 2006 06:58:37 -0400 in comp.lang.c++, Pete Becker
<pe********@acm.orgwrote,
>>a wrote:
>>I need to write a regular expression to match a quoted string in which
the
double quote character itself is represented by 2 double quotes. For
example:

"beginning ""nested quoted string"" end"

Quotes aren't special characters in any of the regular expression
grammars supported by TR1, so the regular expression is

""(.*)""

Close, but no cigar. That will match a string that has quotes at
the begin and end but the quotes inside (if any) might not be
doubled.

If a quoted string "A" is matched simply by /"[^"]*"/
or in c++, "\"[^\"]*\""

A string with possible double quotes in it "A""A" is just a series
of the above. So it is /("[^"]*")*/, or in c++ "(\"[^\"]*\")*".

#include <string>
#include <iostream>

#define BOOST_REGEX_NO_FILEITER
#include <boost/regex.hpp>

using namespace std;
using namespace boost;

bool check (std::string what)
{
static const boost::regex expr("(\"[^\"]*\")*" );
boost::smatch tokens;
bool result = regex_match(what, tokens, expr);
cout << result << "\t(" << what << ")\n";
return result;
}

int main(int argc, char* argv[])
{
check("\"I am a duck\"");
check("\"I am a \"\" duck\"");
check("\"I \"\"am\"\" a \"\" duck\"");
check("I am not a duck");
check("\"I am \"not\" a duck\"");
check("I am \"not\" a duck");
check("\"I am not \" a duck\"");
}

Sep 9 '06 #9

P: n/a
On Sat, 09 Sep 2006 16:58:37 GMT in comp.lang.c++, "a"
<xx*****@pacbell.netwrote,
>Now I only have to be able to mark the content of the outermost string
(without the enclosing quotes) and I'm done.
This time I have to complain about ambiguity. What is the
"outermost" string? The even numbered captured pieces?

Sep 10 '06 #10

This discussion thread is closed

Replies have been disabled for this discussion.