471,570 Members | 942 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,570 software developers and data experts.

Regex help needed...

JS
I am writing a C# app that needs to parse a sentence entered by the user
for a simple boolean search.
I need to capture all of the AND words that are not inside of double
quotes. However, I am having a heck of a time figuring out a regex for it.
Can anyone assist with a regex to find all the AND's not in double quotes?

An example sentence might be:

red and blue and "crazy elephant" and "orange and red" and stuff.

I would need the 1st, 2nd, 3rd and 5th AND in the sentence, but not the 4th
one that is in "orange AND red".

I have several other parsing expressions in this program, but for some
reason, this particular regex eludes me, and I have been at it for some
time.

Any help would be appreciated.

TIA
-JS

PS: if there is a better usenet group for this question, please advise, as
I could not find one just for regex.
Dec 2 '05 #1
4 3086
You don't want a Regular Expression here. For example, as a human user is
inputting the string, what happens when the user inputs the following:

red and blue and "crazy elephant and "orange and red" and stuff.

Note that there are THREE sets of double-quotes in the input. So, what's
inside double-quotes, and what is not? Is the "and" after "elephant" inside
double-quotes? Is the "and" between "orange" nad "red" inside double quotes?
Are both? are neither?

you're only option here is to split the string on the double-quotes, and
then count. When you hit a double-quote, anything after it that is followed
by the next double-quote is "inside double-quotes." If there IS no next
double-quote, NOTHING after the first double-quote is inside double-quotes.

You will need to split the string in order to parse it in any case.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
You can lead a fish to a bicycle,
but you can't make it stink.

"JS" <pl*********@use.net> wrote in message
news:Xn******************************@207.46.248.1 6...
I am writing a C# app that needs to parse a sentence entered by the user
for a simple boolean search.
I need to capture all of the AND words that are not inside of double
quotes. However, I am having a heck of a time figuring out a regex for
it.
Can anyone assist with a regex to find all the AND's not in double quotes?

An example sentence might be:

red and blue and "crazy elephant" and "orange and red" and stuff.

I would need the 1st, 2nd, 3rd and 5th AND in the sentence, but not the
4th
one that is in "orange AND red".

I have several other parsing expressions in this program, but for some
reason, this particular regex eludes me, and I have been at it for some
time.

Any help would be appreciated.

TIA
-JS

PS: if there is a better usenet group for this question, please advise, as
I could not find one just for regex.

Dec 2 '05 #2
JS
After over an hour of working on this one...it comes to me minutes after I
post...Murphy's Law I guess...

Anyway, in case anyone needs this, the answer is...

(?:".+?")?(\s+and\s+)(?:".+?")?
JS <pl*********@use.net> wrote in
news:Xn******************************@207.46.248.1 6:

<snip>
Can anyone assist with a regex to find all the AND's not in
double quotes?


<snip>
Dec 2 '05 #3
JS
Thanks for the replay. You provide a very good point about the quotes.
Such a string as you provided would not pass my initial validator. In
order to help prevent any type of SQL injections, I do not allow the user
to enter symbols within the quoted sets. I also do a check for even
number of double quotes. Both of these are in an end user syntax
validator message. All symbols outside of the double quotes are allowed,
but are subsiquently removed or replaced before this regex is applied.

"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in
news:#p*************@TK2MSFTNGP15.phx.gbl:
You don't want a Regular Expression here. For example, as a human user
is inputting the string, what happens when the user inputs the
following:

red and blue and "crazy elephant and "orange and red" and stuff.

Note that there are THREE sets of double-quotes in the input. So,
what's inside double-quotes, and what is not? Is the "and" after
"elephant" inside double-quotes? Is the "and" between "orange" nad
"red" inside double quotes? Are both? are neither?

you're only option here is to split the string on the double-quotes,
and then count. When you hit a double-quote, anything after it that is
followed by the next double-quote is "inside double-quotes." If there
IS no next double-quote, NOTHING after the first double-quote is
inside double-quotes.

You will need to split the string in order to parse it in any case.


Dec 2 '05 #4
Hi JS,

How many search engines have you seen that throw an exception or do not
allow certain characters to be input by the user? I haven't seen any. The
reason is, users are not always very smart people, and get discouraged
easily. It's more user-friendly to simply accept the input and deal with the
inconsistencies and possible attacks internally. Just a suggestion.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
You can lead a fish to a bicycle,
but you can't make it stink.

"JS" <pl*********@use.net> wrote in message
news:Xn******************************@207.46.248.1 6...
Thanks for the replay. You provide a very good point about the quotes.
Such a string as you provided would not pass my initial validator. In
order to help prevent any type of SQL injections, I do not allow the user
to enter symbols within the quoted sets. I also do a check for even
number of double quotes. Both of these are in an end user syntax
validator message. All symbols outside of the double quotes are allowed,
but are subsiquently removed or replaced before this regex is applied.

"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in
news:#p*************@TK2MSFTNGP15.phx.gbl:
You don't want a Regular Expression here. For example, as a human user
is inputting the string, what happens when the user inputs the
following:

red and blue and "crazy elephant and "orange and red" and stuff.

Note that there are THREE sets of double-quotes in the input. So,
what's inside double-quotes, and what is not? Is the "and" after
"elephant" inside double-quotes? Is the "and" between "orange" nad
"red" inside double quotes? Are both? are neither?

you're only option here is to split the string on the double-quotes,
and then count. When you hit a double-quote, anything after it that is
followed by the next double-quote is "inside double-quotes." If there
IS no next double-quote, NOTHING after the first double-quote is
inside double-quotes.

You will need to split the string in order to parse it in any case.

Dec 2 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

4 posts views Thread by H | last post: by
2 posts views Thread by D | last post: by
17 posts views Thread by steve | last post: by
13 posts views Thread by Chris Lieb | last post: by
4 posts views Thread by ad | last post: by
4 posts views Thread by Flomo Togba Kwele | last post: by
reply views Thread by leo001 | last post: by
reply views Thread by lumer26 | last post: by
reply views Thread by Vinnie | last post: by
reply views Thread by lumer26 | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.