Wes:
Unfortunately, the new string doesn't work at all. Also,, m.Groups[0].Value
still returns the string in quotes (using the original string you gave me).
I did try to figure out what that pattern is doing - whew! It uses a
character that isn't even documented in the doc I've been using - the "<"
char? I'm going by what's at:
http://msdn.microsoft.com/library/de...gexpsyntax.asp
At this point, this is mostly an intellectual exercise - I have it working
by trimming out the surrounding quotes. Just a little bit of a hack. If
you have something else for me to try, I'd love to try it. I used to be
competent with this stuff, in my old sed, awk, and lex days. But, it's been
a while. If you'd prefer to punt, that's fine, and thanks for all your help
so far.
- Dave
"Wes" <newsgroups@puzzleware.net> wrote in message
news:eaUT1L8tEHA.1228@TK2MSFTNGP10.phx.gbl...[color=blue]
> Hello Dave,
>
> It looks like I had a typo in my regular expression (an extra quote) here[/color]
is the corrected version[color=blue]
> (?:\"([^\"]+)\"|\\S+)
> but that isn't your problem.
>
> It looks like from the example you have there you are getting the value[/color]
from m.ToString(). That will actually return the Value of the first group
(m.Group[0].Value) which is defaultly the entire sub-string that the match
was found in. You can try m.Group[1].Value that will give you the string
without quotes.[color=blue]
>
> I just dug up a regular expression I used in the past to split a string at[/color]
any whitespace but not split if the string is within quotes.[color=blue]
>
> string searchText = "\"two words\" and asdf";
> string[] split = Regex.Split(searchText,[/color]
@"(?<!""\b[^""]*)\s+(?![^""]*\b"")");[color=blue]
> foreach (string s in split)
> Console.WriteLine(s.Trim('"'));
>
> // Output
> two words
> and
> asdf
>
> It does however leave the quotes on the string but that is taken care of[/color]
with Trim. I think this may make your job a little easier (that is as long
as you don't try to figure out exactly what that regular expression is
doing, I still have trouble with it when I don't look at it for a while ;)[color=blue]
>
> HTH
> Wes Haggard
>
http://weblogs.asp.net/whaggard/
>[color=green]
> > Thanks, Wes!
> >
> > The regex string you gave me now solves the big problem - it returns
> > the entire "phrase" inside the quotes. It still does return the
> > quotes themselves, though. I can strip those out with a call to
> > Trim(), but that's a little bit of a hack. Can you figure out how to
> > tell it to strip the quotes for me?
> >
> > - Dave
> >
> > "Wes" <newsgroups@puzzleware.net> wrote in message
> > news:uQ8KB06tEHA.1472@TK2MSFTNGP10.phx.gbl...
> >[color=darkred]
> >>> I'm struggling with something that should be fairly simple. I just
> >>> don't know the regext syntax very well, unfortunately.
> >>>
> >>> I'd like to parse words out of what is basically a boolean search
> >>> string. It's actually the input string into a Microsoft Index Server
> >>> search.
> >>>
> >>> The string will consist of words, perhaps enclosed in quotes or
> >>> parentheses. I'd like to use Regex to pull out the words, or the
> >>> phrases if the words are enclosed in quotes. Example
> >>>
> >>> The string: asdf or qwer or hjkl
> >>> should yield three results: asdf, qwer, hjkl
> >>> and:
> >>> "two words" and asdf
> >>> should yield two results: "two words", and "asdf"
> >>> There's the added complexity that the strings may have groups of
> >>> words
> >>> surrounded by parentheses, but I think I can figure that out if I
> >>> solve the quoted strings problem.
> >>> I've tried a few things, but I can't manage to come up with
> >>> something that isn't returning the quotes in the return values.
> >>>
> >>> Here's some code:
> >>>
> >>> Regex regEx("") = new Regex("([\"][^\"]+[\"]|\\S+)");
> >>>
> >>> string searchText = "\"two words\" and asdf";
> >>> foreach (Match m in regEx.Matches(searchText))
> >>> {
> >>> string text = m.ToString();
> >>> MessageBox.Show(text);
> >>> }
> >>> In the above code, it will pull out the words, but the text pulled
> >>> out
> >>> includes the quotes in "two words";
> >>> I tried to tell it to match but ignore the quotes, using:
> >>> Regex regEx("") = new Regex("(?:(\"){1}[^\"]?:(\"){1}|\\S)+");
> >>> but that doesn't work either. Obviously I don't know what I'm
> >>> doing.
> >>> Please help!
> >>>
> >>> - Daev
> >>>
> >> Hello Dave,
> >>
> >> With "(?:(\"){1}[^\"]?:(\"){1}|\\S)+" you are saying don't capture
> >> the the
> >>[/color]
> > whole thing i.e. by '(?:' but you are capturing both quotes
> > individually with (\").
> >[color=darkred]
> >> Try (?:"\"([^\"]+)\"|\\S+)
> >>
> >> This should only capture the stuff with quotes around it, excuding
> >> the
> >>[/color]
> > quotes.
> >[color=darkred]
> >> HTH
> >> Wes Haggard
> >>
http://weblogs.asp.net/whaggard/[/color][/color]
>[/color]