473,836 Members | 1,509 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Regular expression to match a nested quoted string

a
I need to write a regular expression to match a quoted string in which the
double quote character itself is represented by 2 double quotes. For
example:

"beginning ""nested quoted string"" end"

Any idea how to write this in boost::xpressiv e or boost::regex.

Thanks,

A
Sep 9 '06 #1
9 7378
"a" <xx*****@pacbel l.netwrote in message
news:X2******** ***********@new ssvr11.news.pro digy.com...
>I need to write a regular expression to match a quoted string in which the
double quote character itself is represented by 2 double quotes. For
example:

"beginning ""nested quoted string"" end"

Any idea how to write this in boost::xpressiv e or boost::regex.

Thanks,

A
I'm not familiar with boost, but in C++ you need to escape the " like \"
Try
"beginning \"nested quoted string\" end"
Sep 9 '06 #2
a wrote:
I need to write a regular expression to match a quoted string in which the
double quote character itself is represented by 2 double quotes. For
example:

"beginning ""nested quoted string"" end"
Quotes aren't special characters in any of the regular expression
grammars supported by TR1, so the regular expression is

""(.*)""

To write that as a C++ text string, add escapes to the quotes:

char *rx = "\"\"(.*)\" \"";

That is, unless you're using the basic or grep grammars. In those cases
you need \( instead of (, so the regular expression is

""\(.*\)""

and the corresponding literal constant has more escapes:

char *rx = "\"\"\\(.*\\)\" \"";

--

-- Pete

Author of "The Standard C++ Library Extensions: a Tutorial and Reference."
For more information about this book, see www.petebecker.com/tr1book.
Sep 9 '06 #3
a
Thanks for your answers.

I'll clarify my question.

The issue for me is not how to escape quote characters in a C quoted string,
but rather what regular expression to use to match quoted strings that
contain other nested quoted strings

The strings are coming from an external text file in Microsoft rc file
format used to describe resources used by an application. Here is a sample:

IDD_ADD_DUPLICA TE
......
CAPTION "Duplicate Entry Found"
......
LTEXT "You are trying to add the entry""%s <%s>"" to your
list, but ""%s <%s>"" already exists.\n\n\nWo uld you like to merge these two
entries instead of creating a new one?",
.......
END

My expression must match the first string ("Duplicate Entry Found" ) as well
as the _entire_ string after LTEXT:

"You are trying to add the entry""%s <%s>"" to your list, but ""%s <%s>""
already exists.\n\n\nWo uld you like to merge these two entries instead of
creating a new one?"

If I use a simple regex for a quoted string, it will stop at the first " and
it will match only: "You are trying to add the entry".

So the question is how to make the expression recognize that one '"' is the
end of the string, but '""' is part of the string.

I tried using static regexes in boost::xpressiv e, but they generate runtime
stack overflows when I add patterns for the inner "" characters, so I'm
assuming the expressions are not correct.

Thanks,

A
Sep 9 '06 #4
On Sat, 09 Sep 2006 06:58:37 -0400 in comp.lang.c++, Pete Becker
<pe********@acm .orgwrote,
>a wrote:
>I need to write a regular expression to match a quoted string in which the
double quote character itself is represented by 2 double quotes. For
example:

"beginning ""nested quoted string"" end"

Quotes aren't special characters in any of the regular expression
grammars supported by TR1, so the regular expression is

""(.*)""
Close, but no cigar. That will match a string that has quotes at
the begin and end but the quotes inside (if any) might not be
doubled.

If a quoted string "A" is matched simply by /"[^"]*"/
or in c++, "\"[^\"]*\""

A string with possible double quotes in it "A""A" is just a series
of the above. So it is /("[^"]*")*/, or in c++ "(\"[^\"]*\")*".

#include <string>
#include <iostream>

#define BOOST_REGEX_NO_ FILEITER
#include <boost/regex.hpp>

using namespace std;
using namespace boost;

bool check (std::string what)
{
static const boost::regex expr("(\"[^\"]*\")*" );
boost::smatch tokens;
bool result = regex_match(wha t, tokens, expr);
cout << result << "\t(" << what << ")\n";
return result;
}

int main(int argc, char* argv[])
{
check("\"I am a duck\"");
check("\"I am a \"\" duck\"");
check("\"I \"\"am\"\" a \"\" duck\"");
check("I am not a duck");
check("\"I am \"not\" a duck\"");
check("I am \"not\" a duck");
check("\"I am not \" a duck\"");
}

Sep 9 '06 #5
David Harmon wrote:
On Sat, 09 Sep 2006 06:58:37 -0400 in comp.lang.c++, Pete Becker
<pe********@acm .orgwrote,
>>a wrote:
>>>I need to write a regular expression to match a quoted string in which the
double quote character itself is represented by 2 double quotes. For
example:

"beginning ""nested quoted string"" end"

Quotes aren't special characters in any of the regular expression
grammars supported by TR1, so the regular expression is

""(.*)""


Close, but no cigar. That will match a string that has quotes at
the begin and end but the quotes inside (if any) might not be
doubled.
You're reading far too much into a vague specification. The regular
expression I gave matches the nested quoted string in the example.

--

-- Pete

Author of "The Standard C++ Library Extensions: a Tutorial and Reference."
For more information about this book, see www.petebecker.com/tr1book.
Sep 9 '06 #6
a
Actually David's regex does the job. It's my fault that my initial specs
weren't too clear - see my previous reply to your post that better describes
the problem.

Thanks,

A

"Pete Becker" <pe********@acm .orgwrote in message
news:VN******** *************** *******@giganew s.com...
David Harmon wrote:
>On Sat, 09 Sep 2006 06:58:37 -0400 in comp.lang.c++, Pete Becker
<pe********@ac m.orgwrote,
>>>a wrote:

I need to write a regular expression to match a quoted string in which
the
double quote character itself is represented by 2 double quotes. For
example:

"beginnin g ""nested quoted string"" end"
Quotes aren't special characters in any of the regular expression
grammars supported by TR1, so the regular expression is

""(.*)""


Close, but no cigar. That will match a string that has quotes at
the begin and end but the quotes inside (if any) might not be
doubled.

You're reading far too much into a vague specification. The regular
expression I gave matches the nested quoted string in the example.

--

-- Pete

Author of "The Standard C++ Library Extensions: a Tutorial and Reference."
For more information about this book, see www.petebecker.com/tr1book.

Sep 9 '06 #7
a wrote:
>
IDD_ADD_DUPLICA TE
.....
CAPTION "Duplicate Entry Found"
.....
LTEXT "You are trying to add the entry""%s <%s>"" to your
list, but ""%s <%s>"" already exists.\n\n\nWo uld you like to merge these two
entries instead of creating a new one?",
......
END

My expression must match the first string ("Duplicate Entry Found" ) as well
as the _entire_ string after LTEXT:
Okay, with more context, it looks like David Harmon's guess was right.

--

-- Pete

Author of "The Standard C++ Library Extensions: a Tutorial and Reference."
For more information about this book, see www.petebecker.com/tr1book.
Sep 9 '06 #8
a
Thanks! This works.

Now I only have to be able to mark the content of the outermost string
(without the enclosing quotes) and I'm done.

Thanks,

A
"David Harmon" <so****@netcom. comwrote in message
news:45******** ********@news.w est.earthlink.n et...
On Sat, 09 Sep 2006 06:58:37 -0400 in comp.lang.c++, Pete Becker
<pe********@acm .orgwrote,
>>a wrote:
>>I need to write a regular expression to match a quoted string in which
the
double quote character itself is represented by 2 double quotes. For
example:

"beginning ""nested quoted string"" end"

Quotes aren't special characters in any of the regular expression
grammars supported by TR1, so the regular expression is

""(.*)""

Close, but no cigar. That will match a string that has quotes at
the begin and end but the quotes inside (if any) might not be
doubled.

If a quoted string "A" is matched simply by /"[^"]*"/
or in c++, "\"[^\"]*\""

A string with possible double quotes in it "A""A" is just a series
of the above. So it is /("[^"]*")*/, or in c++ "(\"[^\"]*\")*".

#include <string>
#include <iostream>

#define BOOST_REGEX_NO_ FILEITER
#include <boost/regex.hpp>

using namespace std;
using namespace boost;

bool check (std::string what)
{
static const boost::regex expr("(\"[^\"]*\")*" );
boost::smatch tokens;
bool result = regex_match(wha t, tokens, expr);
cout << result << "\t(" << what << ")\n";
return result;
}

int main(int argc, char* argv[])
{
check("\"I am a duck\"");
check("\"I am a \"\" duck\"");
check("\"I \"\"am\"\" a \"\" duck\"");
check("I am not a duck");
check("\"I am \"not\" a duck\"");
check("I am \"not\" a duck");
check("\"I am not \" a duck\"");
}

Sep 9 '06 #9
On Sat, 09 Sep 2006 16:58:37 GMT in comp.lang.c++, "a"
<xx*****@pacbel l.netwrote,
>Now I only have to be able to mark the content of the outermost string
(without the enclosing quotes) and I'm done.
This time I have to complain about ambiguity. What is the
"outermost" string? The even numbered captured pieces?

Sep 10 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
4191
by: Kenneth McDonald | last post by:
I'm working on the 0.8 release of my 'rex' module, and would appreciate feedback, suggestions, and criticism as I work towards finalizing the API and feature sets. rex is a module intended to make regular expressions easier to create and use (and in my experience as a regular expression user, it makes them MUCH easier to create and use.) I'm still working on formal documentation, and in any case, such documentation isn't necessarily the...
4
3154
by: henrik | last post by:
Hi I have a regex question. I want to find all content of a <td class="someclass"> tag. This means the expression should include all other tags included between <td class="someclass"> and </td>. Please help Regards
5
2294
by: Cylix | last post by:
I am going to write a function that the search engine done. in search engine, we may using double quotation to specify a pharse like "I love you", How can I using regular expression to sperate each pharse? test case: "I love" all "of you" I would like it return:
25
5185
by: Mike | last post by:
I have a regular expression (^(.+)(?=\s*).*\1 ) that results in matches. I would like to get what the actual regular expression is. In other words, when I apply ^(.+)(?=\s*).*\1 to " HEART (CONDUCTION DEFECT) 37.33/2 HEART (CONDUCTION DEFECT) WITH CATHETER 37.34/2 " the expression is "HEART (CONDUCTION DEFECT)". How do I gain access to the expression (not the matches) at runtime? Thanks, Mike
5
2325
by: Avi Kak | last post by:
Folks, Does regular expression processing in Python allow for executable code to be embedded inside a regular expression? For example, in Perl the following two statements $regex = qr/hello(?{print "saw hello\n"})mello(?{print "saw mello\n"})/; "jellohellomello" =~ /$regex/;
5
3793
by: shawnmkramer | last post by:
Anyone every heard of the Regex.IsMatch and Regex.Match methods just hanging and eventually getting a message "Requested Service not found"? I have the following pattern: ^(?<OrgCity>(+)+), City of, (?<OrgState>(()|( +\.)))( \((?<OrgCountry>{2,})\))?$ (ignore the line wrap)
4
1635
by: carlos | last post by:
I am working on a regular expression validation for my search page. What I have so far works for most cases, but I would like to fine tune it some. I am new to regular expressions, and I do not have the time to read up some more on it. Can someone help? What I would like to do is allow words to be parsed using quotes. However, they can also include boolean searching. Lastly, I need to ensure the character's do not exceed a certain...
10
1580
by: supercrossking | last post by:
I am trying to the values of string of text in the sample before. The ds are for digits and s is for string and string of text is for a string with more than one or two values. I am trying to use regex and the .groups method. Please help. d|d|d|string of text 1||s|s|||dd.dd|ss|string of text 2||||||||||||||||||||||||||string of text 2 I only want string of text1
14
4996
by: Andy B | last post by:
I need to create a regular expression that will match a 5 digit number, a space and then anything up to but not including the next closing html tag. Here is an example: <startTag>55555 any text</aClosingTag> I need a Regex that will get all of the text between the html tags above (the html tags are random and i do not know them before hand). The match string always starts with at least 5 digits.
0
9825
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9671
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10854
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10257
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9387
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7794
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6981
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5651
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
4459
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.