473,595 Members | 2,638 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Regex help, please - Recognize quoted strings?

I'm struggling with something that should be fairly simple. I just don't
know the regext syntax very well, unfortunately.

I'd like to parse words out of what is basically a boolean search string.
It's actually the input string into a Microsoft Index Server search.

The string will consist of words, perhaps enclosed in quotes or parentheses.
I'd like to use Regex to pull out the words, or the phrases if the words are
enclosed in quotes. Example

The string: asdf or qwer or hjkl
should yield three results: asdf, qwer, hjkl

and:

"two words" and asdf
should yield two results: "two words", and "asdf"

There's the added complexity that the strings may have groups of words
surrounded by parentheses, but I think I can figure that out if I solve the
quoted strings problem.

I've tried a few things, but I can't manage to come up with something that
isn't returning the quotes in the return values.

Here's some code:

Regex regEx("") = new Regex("([\"][^\"]+[\"]|\\S+)");

string searchText = "\"two words\" and asdf";
foreach (Match m in regEx.Matches(s earchText))
{
string text = m.ToString();

MessageBox.Show (text);
}

In the above code, it will pull out the words, but the text pulled out
includes the quotes in "two words";
I tried to tell it to match but ignore the quotes, using:
Regex regEx("") = new Regex("(?:(\"){ 1}[^\"]?:(\"){1}|\\S)+ ");

but that doesn't work either. Obviously I don't know what I'm doing.

Please help!

- Daev
Nov 16 '05 #1
6 4784
Wes
> I'm struggling with something that should be fairly simple. I just
don't know the regext syntax very well, unfortunately.

I'd like to parse words out of what is basically a boolean search
string. It's actually the input string into a Microsoft Index Server
search.

The string will consist of words, perhaps enclosed in quotes or
parentheses. I'd like to use Regex to pull out the words, or the
phrases if the words are enclosed in quotes. Example

The string: asdf or qwer or hjkl
should yield three results: asdf, qwer, hjkl
and:

"two words" and asdf
should yield two results: "two words", and "asdf"
There's the added complexity that the strings may have groups of words
surrounded by parentheses, but I think I can figure that out if I
solve the quoted strings problem.

I've tried a few things, but I can't manage to come up with something
that isn't returning the quotes in the return values.

Here's some code:

Regex regEx("") = new Regex("([\"][^\"]+[\"]|\\S+)");

string searchText = "\"two words\" and asdf";
foreach (Match m in regEx.Matches(s earchText))
{
string text = m.ToString();
MessageBox.Show (text);
}
In the above code, it will pull out the words, but the text pulled out
includes the quotes in "two words";
I tried to tell it to match but ignore the quotes, using:
Regex regEx("") = new Regex("(?:(\"){ 1}[^\"]?:(\"){1}|\\S)+ ");
but that doesn't work either. Obviously I don't know what I'm doing.

Please help!

- Daev


Hello Dave,

With "(?:(\"){1}[^\"]?:(\"){1}|\\S)+ " you are saying don't capture the the whole thing i.e. by '(?:' but you are capturing both quotes individually with (\").

Try (?:"\"([^\"]+)\"|\\S+)

This should only capture the stuff with quotes around it, excuding the quotes.

HTH
Wes Haggard
http://weblogs.asp.net/whaggard/
Nov 16 '05 #2
Thanks, Wes!

The regex string you gave me now solves the big problem - it returns the
entire "phrase" inside the quotes. It still does return the quotes
themselves, though. I can strip those out with a call to Trim(), but that's
a little bit of a hack. Can you figure out how to tell it to strip the
quotes for me?

- Dave

"Wes" <ne********@puz zleware.net> wrote in message
news:uQ******** ******@TK2MSFTN GP10.phx.gbl...
I'm struggling with something that should be fairly simple. I just
don't know the regext syntax very well, unfortunately.

I'd like to parse words out of what is basically a boolean search
string. It's actually the input string into a Microsoft Index Server
search.

The string will consist of words, perhaps enclosed in quotes or
parentheses. I'd like to use Regex to pull out the words, or the
phrases if the words are enclosed in quotes. Example

The string: asdf or qwer or hjkl
should yield three results: asdf, qwer, hjkl
and:

"two words" and asdf
should yield two results: "two words", and "asdf"
There's the added complexity that the strings may have groups of words
surrounded by parentheses, but I think I can figure that out if I
solve the quoted strings problem.

I've tried a few things, but I can't manage to come up with something
that isn't returning the quotes in the return values.

Here's some code:

Regex regEx("") = new Regex("([\"][^\"]+[\"]|\\S+)");

string searchText = "\"two words\" and asdf";
foreach (Match m in regEx.Matches(s earchText))
{
string text = m.ToString();
MessageBox.Show (text);
}
In the above code, it will pull out the words, but the text pulled out
includes the quotes in "two words";
I tried to tell it to match but ignore the quotes, using:
Regex regEx("") = new Regex("(?:(\"){ 1}[^\"]?:(\"){1}|\\S)+ ");
but that doesn't work either. Obviously I don't know what I'm doing.

Please help!

- Daev
Hello Dave,

With "(?:(\"){1}[^\"]?:(\"){1}|\\S)+ " you are saying don't capture the the

whole thing i.e. by '(?:' but you are capturing both quotes individually
with (\").
Try (?:"\"([^\"]+)\"|\\S+)

This should only capture the stuff with quotes around it, excuding the quotes.
HTH
Wes Haggard
http://weblogs.asp.net/whaggard/

Nov 16 '05 #3
Wes
Hello Dave,

It looks like I had a typo in my regular expression (an extra quote) here is the corrected version
(?:\"([^\"]+)\"|\\S+)
but that isn't your problem.

It looks like from the example you have there you are getting the value from m.ToString(). That will actually return the Value of the first group (m.Group[0].Value) which is defaultly the entire sub-string that the match was found in. You can try m.Group[1].Value that will give you the string without quotes.

I just dug up a regular expression I used in the past to split a string at any whitespace but not split if the string is within quotes.

string searchText = "\"two words\" and asdf";
string[] split = Regex.Split(sea rchText, @"(?<!""\b[^""]*)\s+(?![^""]*\b"")");
foreach (string s in split)
Console.WriteLi ne(s.Trim('"')) ;

// Output
two words
and
asdf

It does however leave the quotes on the string but that is taken care of with Trim. I think this may make your job a little easier (that is as long as you don't try to figure out exactly what that regular expression is doing, I still have trouble with it when I don't look at it for a while ;)

HTH
Wes Haggard
http://weblogs.asp.net/whaggard/
Thanks, Wes!

The regex string you gave me now solves the big problem - it returns
the entire "phrase" inside the quotes. It still does return the
quotes themselves, though. I can strip those out with a call to
Trim(), but that's a little bit of a hack. Can you figure out how to
tell it to strip the quotes for me?

- Dave

"Wes" <ne********@puz zleware.net> wrote in message
news:uQ******** ******@TK2MSFTN GP10.phx.gbl...
I'm struggling with something that should be fairly simple. I just
don't know the regext syntax very well, unfortunately.

I'd like to parse words out of what is basically a boolean search
string. It's actually the input string into a Microsoft Index Server
search.

The string will consist of words, perhaps enclosed in quotes or
parentheses. I'd like to use Regex to pull out the words, or the
phrases if the words are enclosed in quotes. Example

The string: asdf or qwer or hjkl
should yield three results: asdf, qwer, hjkl
and:
"two words" and asdf
should yield two results: "two words", and "asdf"
There's the added complexity that the strings may have groups of
words
surrounded by parentheses, but I think I can figure that out if I
solve the quoted strings problem.
I've tried a few things, but I can't manage to come up with
something that isn't returning the quotes in the return values.

Here's some code:

Regex regEx("") = new Regex("([\"][^\"]+[\"]|\\S+)");

string searchText = "\"two words\" and asdf";
foreach (Match m in regEx.Matches(s earchText))
{
string text = m.ToString();
MessageBox.Show (text);
}
In the above code, it will pull out the words, but the text pulled
out
includes the quotes in "two words";
I tried to tell it to match but ignore the quotes, using:
Regex regEx("") = new Regex("(?:(\"){ 1}[^\"]?:(\"){1}|\\S)+ ");
but that doesn't work either. Obviously I don't know what I'm
doing.
Please help!

- Daev

Hello Dave,

With "(?:(\"){1}[^\"]?:(\"){1}|\\S)+ " you are saying don't capture
the the

whole thing i.e. by '(?:' but you are capturing both quotes
individually with (\").
Try (?:"\"([^\"]+)\"|\\S+)

This should only capture the stuff with quotes around it, excuding
the

quotes.
HTH
Wes Haggard
http://weblogs.asp.net/whaggard/


Nov 16 '05 #4
Wes:

Unfortunately, the new string doesn't work at all. Also,, m.Groups[0].Value
still returns the string in quotes (using the original string you gave me).
I did try to figure out what that pattern is doing - whew! It uses a
character that isn't even documented in the doc I've been using - the "<"
char? I'm going by what's at:
http://msdn.microsoft.com/library/de...gexpsyntax.asp

At this point, this is mostly an intellectual exercise - I have it working
by trimming out the surrounding quotes. Just a little bit of a hack. If
you have something else for me to try, I'd love to try it. I used to be
competent with this stuff, in my old sed, awk, and lex days. But, it's been
a while. If you'd prefer to punt, that's fine, and thanks for all your help
so far.

- Dave

"Wes" <ne********@puz zleware.net> wrote in message
news:ea******** ******@TK2MSFTN GP10.phx.gbl...
Hello Dave,

It looks like I had a typo in my regular expression (an extra quote) here is the corrected version (?:\"([^\"]+)\"|\\S+)
but that isn't your problem.

It looks like from the example you have there you are getting the value from m.ToString(). That will actually return the Value of the first group
(m.Group[0].Value) which is defaultly the entire sub-string that the match
was found in. You can try m.Group[1].Value that will give you the string
without quotes.
I just dug up a regular expression I used in the past to split a string at any whitespace but not split if the string is within quotes.
string searchText = "\"two words\" and asdf";
string[] split = Regex.Split(sea rchText, @"(?<!""\b[^""]*)\s+(?![^""]*\b"")"); foreach (string s in split)
Console.WriteLi ne(s.Trim('"')) ;

// Output
two words
and
asdf

It does however leave the quotes on the string but that is taken care of with Trim. I think this may make your job a little easier (that is as long
as you don't try to figure out exactly what that regular expression is
doing, I still have trouble with it when I don't look at it for a while ;)
HTH
Wes Haggard
http://weblogs.asp.net/whaggard/
Thanks, Wes!

The regex string you gave me now solves the big problem - it returns
the entire "phrase" inside the quotes. It still does return the
quotes themselves, though. I can strip those out with a call to
Trim(), but that's a little bit of a hack. Can you figure out how to
tell it to strip the quotes for me?

- Dave

"Wes" <ne********@puz zleware.net> wrote in message
news:uQ******** ******@TK2MSFTN GP10.phx.gbl...
I'm struggling with something that should be fairly simple. I just
don't know the regext syntax very well, unfortunately.

I'd like to parse words out of what is basically a boolean search
string. It's actually the input string into a Microsoft Index Server
search.

The string will consist of words, perhaps enclosed in quotes or
parentheses. I'd like to use Regex to pull out the words, or the
phrases if the words are enclosed in quotes. Example

The string: asdf or qwer or hjkl
should yield three results: asdf, qwer, hjkl
and:
"two words" and asdf
should yield two results: "two words", and "asdf"
There's the added complexity that the strings may have groups of
words
surrounded by parentheses, but I think I can figure that out if I
solve the quoted strings problem.
I've tried a few things, but I can't manage to come up with
something that isn't returning the quotes in the return values.

Here's some code:

Regex regEx("") = new Regex("([\"][^\"]+[\"]|\\S+)");

string searchText = "\"two words\" and asdf";
foreach (Match m in regEx.Matches(s earchText))
{
string text = m.ToString();
MessageBox.Show (text);
}
In the above code, it will pull out the words, but the text pulled
out
includes the quotes in "two words";
I tried to tell it to match but ignore the quotes, using:
Regex regEx("") = new Regex("(?:(\"){ 1}[^\"]?:(\"){1}|\\S)+ ");
but that doesn't work either. Obviously I don't know what I'm
doing.
Please help!

- Daev

Hello Dave,

With "(?:(\"){1}[^\"]?:(\"){1}|\\S)+ " you are saying don't capture
the the

whole thing i.e. by '(?:' but you are capturing both quotes
individually with (\").
Try (?:"\"([^\"]+)\"|\\S+)

This should only capture the stuff with quotes around it, excuding
the

quotes.
HTH
Wes Haggard
http://weblogs.asp.net/whaggard/

Nov 16 '05 #5
Wes
Hello Dave,
Comments inline.
Wes:

Unfortunately, the new string doesn't work at all. Really? I have tested it on the string you gave me and it worked for me at least it matched quoted strings.
Anyway here is a complete sample piece of code that matches quoted and non-quoted strings.

string searchText = "\"two words\" and asdf";
Regex regEx = new Regex("(?:\"([^\"]+)\"|(\\S+))") ;
foreach (Match m in regEx.Matches(s earchText))
{
// If quoted string
string text = m.Groups[1].Value;

// If non-quoted string
if (text == string.Empty)
text = m.Groups[2].Value;

Console.WriteLi ne(text);
}

// Output
two words
and
asdf
Also,,
m.Groups[0].Value still returns the string in quotes (using the
original string you gave me). m.Groups[1].Value should be the one with no quotes.

I did try to figure out what that pattern is doing - whew! It uses a character that isn't even
documented in the doc I've been using - the "<" char? I'm going by
what's at:
http://msdn.microsoft.com/library/de...ry/en-us/scrip
t56/html/jsgrpregexpsynt ax.asp FYI: the link you gave is for VB script regular expression syntax, which is not exactly the same as .Net Regular Expression syntax, which can be found
http://msdn.microsoft.com/library/de...geElements.asp (and the (?<! ) construct is under the grouping constructs section link, it is a negative lookbehind)
At this point, this is mostly an intellectual exercise - I have it
working by trimming out the surrounding quotes. I know but that is part of the reason for me helping people with issues like this so that I can stay intellectually sharp. ;) Plus i hate giving up before the objective is obtained.
Just a little bit of
a hack. If you have something else for me to try, I'd love to try it.
I used to be competent with this stuff, in my old sed, awk, and lex
days. But, it's been a while. If you'd prefer to punt, that's fine,
and thanks for all your help so far.

- Dave


I hope this is what you are looking for.

Wes Haggard
http://weblogs.asp.net/whaggard/
Nov 16 '05 #6
Wes:

That one almost works. It was the @"(?<!""\b[^""]*)\s+(?![^""]*\b"") one
that I was referring to that didn't work.

The new one works.

Thanks!

"Wes" <ne********@puz zleware.net> wrote in message
news:ep******** ******@TK2MSFTN GP11.phx.gbl...
Hello Dave,
Comments inline.
Wes:

Unfortunately, the new string doesn't work at all. Really? I have tested it on the string you gave me and it worked for me at

least it matched quoted strings. Anyway here is a complete sample piece of code that matches quoted and non-quoted strings.
string searchText = "\"two words\" and asdf";
Regex regEx = new Regex("(?:\"([^\"]+)\"|(\\S+))") ;
foreach (Match m in regEx.Matches(s earchText))
{
// If quoted string
string text = m.Groups[1].Value;

// If non-quoted string
if (text == string.Empty)
text = m.Groups[2].Value;

Console.WriteLi ne(text);
}

// Output
two words
and
asdf
Also,,
m.Groups[0].Value still returns the string in quotes (using the
original string you gave me). m.Groups[1].Value should be the one with no quotes.

I did try to figure out what that
pattern is doing - whew! It uses a character that isn't even
documented in the doc I've been using - the "<" char? I'm going by
what's at:
http://msdn.microsoft.com/library/de...ry/en-us/scrip
t56/html/jsgrpregexpsynt ax.asp

FYI: the link you gave is for VB script regular expression syntax, which

is not exactly the same as .Net Regular Expression syntax, which can be
found http://msdn.microsoft.com/library/de...geElements.asp
(and the (?<! ) construct is under the grouping constructs section link, it
is a negative lookbehind)
At this point, this is mostly an intellectual exercise - I have it
working by trimming out the surrounding quotes. I know but that is part of the reason for me helping people with issues

like this so that I can stay intellectually sharp. ;) Plus i hate giving up
before the objective is obtained.
Just a little bit of
a hack. If you have something else for me to try, I'd love to try it.
I used to be competent with this stuff, in my old sed, awk, and lex
days. But, it's been a while. If you'd prefer to punt, that's fine,
and thanks for all your help so far.

- Dave


I hope this is what you are looking for.

Wes Haggard
http://weblogs.asp.net/whaggard/

Nov 16 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
2130
by: Gary McCullough | last post by:
What I want to do sounds simple, but it's defeating me. I want to substitute all occurences of a colon : character in a string with an @ character -- unless the : occurs within a single or double-quoted substring. Surely this can be done with regular expressions? Any regex gurus know how to do it?
2
4493
by: Robert Oschler | last post by:
Can someone give me a regex expression that will split a sentence containing words and double-quoted phrases, into an array? I don't want the words between the double-quotes to be split using the space (or comma) character as a delimiter. I can do one or the other, tokenize the words or tokenize the double-quoted strings, but I can't figure out how to combine the two into the same regex expression. Note: I *do* want to capture (retain)...
4
728
by: William Stacey [MVP] | last post by:
Would like help with a (I think) a common regex split example. Thanks for your example in advance. Cheers! Source Data Example: one "two three" four Optional, but would also like to ignore pairs of brackets like: "one" <tab> "two three" ( four "five six" ) Want fields like:
1
3378
by: Mark | last post by:
Hi, I've seen some postings on this but not exactly relating to this posting. I'm reading in a large mail message as a string. In the string is an xml attachment that I need to parse out and remove from the message once processed. I have to do this as a string and not using any CDO libraries. My problem is that there's normally a large pdf in the file so when I read the file in it's massive and I don't knwo if the XML is at the...
3
2132
by: Luis Esteban Valencia | last post by:
hello quite a simple one if you understand regular expressions vbscript and ..net, probably quite hard if you don't i have a single line input which offers classic search functionality, so if someone puts something in quotes i.e "A Gibbon" i want to extract that prior to using the rest of the string what i need is a regex string that'll turn a postback string eg "A Gibbon" "A Baboon" "George Bush" doris day into a csv array
7
2838
by: melanieab | last post by:
Hi, I'm trying to use DataView to find the row number in the datatable that contains "Rich" in it so that I can highlight it. It works fine when I enter the entire string (i.e. Richard), but I can't seem to make a search for "Rich" recognize that Richard is also what I want. The problem seems to be here: DataView dv = tCat.DefaultView; Regex reg = new Regex(@"^Rich");
8
2580
by: Bob | last post by:
I need to create a Regex to extract all strings (including quotations) from a C# or C++ source file. After being unsuccessful myself, I found this sample on the internet: @"@?""""|@?"".*?(?!\\).""|''|'.*?(?!\\).'" I am inputting the entire source file string and using it with RegexOptions.Singleline. This works OK with, unless the string ends with a back-slash. For example: "This is a test\\". Can anybody see how to fix this...
17
2777
by: Mark | last post by:
I must create a routine that finds tokens in small, arbitrary VB code snippets. For example, it might have to find all occurrences of {Formula} I was thinking that using regular expressions might be a neat way to solve this, but I am new to them. Can anyone give me a hint here? The catch is, it must only find tokens that are not quoted and not commented; examples follow
7
2052
by: Nightcrawler | last post by:
Hi all, I am trying to use regular expressions to parse out mp3 titles into three different groups (artist, title and remix). I currently have three ways to name a mp3 file: Artist - Title Artist - Title (Remix) Artist - Title
0
7955
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8261
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8379
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
8251
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
5839
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
3911
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2391
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1490
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
1223
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.