473,322 Members | 1,510 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,322 software developers and data experts.

Regular expression

I am going to write a function that the search engine done.
in search engine, we may using double quotation to specify a pharse
like "I love you",
How can I using regular expression to sperate each pharse?

test case:
"I love" all "of you"
I would like it return:
"I love", all, "of you"
Thank you!

Jun 1 '06 #1
5 2241
I would make a pattern that matches spaces with an optional quoted
phrase, and split on that.

Some untested code, but it should get you started:

Regex re = new Regex(@" |(?: ?(""[^""]*"") ?)");
string[] splitted = re.Split(input);

Cylix wrote:
I am going to write a function that the search engine done.
in search engine, we may using double quotation to specify a pharse
like "I love you",
How can I using regular expression to sperate each pharse?

test case:
"I love" all "of you"
I would like it return:
"I love", all, "of you"
Thank you!

Jun 1 '06 #2
Well, you've made the usual mistake of not defining your rules. An example
may imply some rules, but not others. For example, your example does not
state whether or not an odd number of double-quotes might be found in the
string. You have not specifically said whether or not double-quotes
surrounding a phrase must be included in the match, nor whether spaces
surrounding a phrase must be included in the match. There are a number of
other rules which are not specified as well, such as handling line breaks.

A regular expression is an expression of a set of rules which must be
absolutely specific.

However, I will give you a few examples that should cover the various
possibilities.

First, we are looking at 2 specific sets of rules:

1. A phrase surrounded by double-quotes.
2. A phrase *not* surrounded by double-quotes.

Therefore, in order to match them, we must either create 2 groups, or use
one group to split the total string into matches of the other. If we use 2
groups, we can get both, but we will have to sort out which is which. If we
only use one, we will need to perform 2 sets of operations:

1. Match all matches.
2. Split and get all remaining elements.

So, the rule for the phrases surrounded by quotes is fairly simple:

"[^"]*"

Translated, this says that a match is defined by a double-quote, followed by
zero or more non-double-quotes (any character except a double-quote),
followed by a double-quote. This will capture, in your example:

"I love"
"of you"

Now, if you create a rule that is the opposite of that, you get:

[^"]*

Translated, this says that a match is any phrase *not* containing a
double-quote.

These 2 can be used together with grouping and an "or " ('|') operator, as
in:

("[^"]*")|([^"]*)

It is important to order them in this way, as the first group will capture
double-quotes, and the second group will capture anything *except*
double-quotes. If the second group is used first, it will capture the
phrases captured by the first group without capturing the double-quotes, and
the first group will not, as they have already been consumed.

When using this version, both groups are captured, effectively capturing the
entire string into 2 groups of matches, and you use the groups to identify
which regular expression was matched (quoted in group 1 and non-quoted in
group 2). You should also note that the second group will capture spaces
between the quoted phrases and the non-quoted phrases, as part of the
non-quoted phrase. I know of no way to trim this in the regular expression
itself, so you would have to trim the values from the matches themselves.

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Development Numbskull

Abnormality is anything but average.

"Göran Andersson" <gu***@guffa.com> wrote in message
news:ev*************@TK2MSFTNGP05.phx.gbl...
I would make a pattern that matches spaces with an optional quoted phrase,
and split on that.

Some untested code, but it should get you started:

Regex re = new Regex(@" |(?: ?(""[^""]*"") ?)");
string[] splitted = re.Split(input);

Cylix wrote:
I am going to write a function that the search engine done.
in search engine, we may using double quotation to specify a pharse
like "I love you",
How can I using regular expression to sperate each pharse?

test case:
"I love" all "of you"
I would like it return: "I love", all, "of you" Thank you!

Jun 1 '06 #3
Was that a reply for me, or did you intend to reply the original poster?

Kevin Spencer wrote:
Well, you've made the usual mistake of not defining your rules. An example
may imply some rules, but not others. For example, your example does not
state whether or not an odd number of double-quotes might be found in the
string. You have not specifically said whether or not double-quotes
surrounding a phrase must be included in the match, nor whether spaces
surrounding a phrase must be included in the match. There are a number of
other rules which are not specified as well, such as handling line breaks.

A regular expression is an expression of a set of rules which must be
absolutely specific.

However, I will give you a few examples that should cover the various
possibilities.

First, we are looking at 2 specific sets of rules:

1. A phrase surrounded by double-quotes.
2. A phrase *not* surrounded by double-quotes.

Therefore, in order to match them, we must either create 2 groups, or use
one group to split the total string into matches of the other. If we use 2
groups, we can get both, but we will have to sort out which is which. If we
only use one, we will need to perform 2 sets of operations:

1. Match all matches.
2. Split and get all remaining elements.

So, the rule for the phrases surrounded by quotes is fairly simple:

"[^"]*"

Translated, this says that a match is defined by a double-quote, followed by
zero or more non-double-quotes (any character except a double-quote),
followed by a double-quote. This will capture, in your example:

"I love"
"of you"

Now, if you create a rule that is the opposite of that, you get:

[^"]*

Translated, this says that a match is any phrase *not* containing a
double-quote.

These 2 can be used together with grouping and an "or " ('|') operator, as
in:

("[^"]*")|([^"]*)

It is important to order them in this way, as the first group will capture
double-quotes, and the second group will capture anything *except*
double-quotes. If the second group is used first, it will capture the
phrases captured by the first group without capturing the double-quotes, and
the first group will not, as they have already been consumed.

When using this version, both groups are captured, effectively capturing the
entire string into 2 groups of matches, and you use the groups to identify
which regular expression was matched (quoted in group 1 and non-quoted in
group 2). You should also note that the second group will capture spaces
between the quoted phrases and the non-quoted phrases, as part of the
non-quoted phrase. I know of no way to trim this in the regular expression
itself, so you would have to trim the values from the matches themselves.

Jun 1 '06 #4
It was intended for the original poster, but I hit the reply button while
your message was opened.

Sorry about any confusion.
--
HTH,

Kevin Spencer
Microsoft MVP
Professional Development Numbskull

Abnormality is anything but average.

"Göran Andersson" <gu***@guffa.com> wrote in message
news:ur**************@TK2MSFTNGP03.phx.gbl...
Was that a reply for me, or did you intend to reply the original poster?

Kevin Spencer wrote:
Well, you've made the usual mistake of not defining your rules. An
example may imply some rules, but not others. For example, your example
does not state whether or not an odd number of double-quotes might be
found in the string. You have not specifically said whether or not
double-quotes surrounding a phrase must be included in the match, nor
whether spaces surrounding a phrase must be included in the match. There
are a number of other rules which are not specified as well, such as
handling line breaks.

A regular expression is an expression of a set of rules which must be
absolutely specific.

However, I will give you a few examples that should cover the various
possibilities.

First, we are looking at 2 specific sets of rules:

1. A phrase surrounded by double-quotes.
2. A phrase *not* surrounded by double-quotes.

Therefore, in order to match them, we must either create 2 groups, or use
one group to split the total string into matches of the other. If we use
2 groups, we can get both, but we will have to sort out which is which.
If we only use one, we will need to perform 2 sets of operations:

1. Match all matches.
2. Split and get all remaining elements.

So, the rule for the phrases surrounded by quotes is fairly simple:

"[^"]*"

Translated, this says that a match is defined by a double-quote, followed
by zero or more non-double-quotes (any character except a double-quote),
followed by a double-quote. This will capture, in your example:

"I love"
"of you"

Now, if you create a rule that is the opposite of that, you get:

[^"]*

Translated, this says that a match is any phrase *not* containing a
double-quote.

These 2 can be used together with grouping and an "or " ('|') operator,
as in:

("[^"]*")|([^"]*)

It is important to order them in this way, as the first group will
capture double-quotes, and the second group will capture anything
*except* double-quotes. If the second group is used first, it will
capture the phrases captured by the first group without capturing the
double-quotes, and the first group will not, as they have already been
consumed.

When using this version, both groups are captured, effectively capturing
the entire string into 2 groups of matches, and you use the groups to
identify which regular expression was matched (quoted in group 1 and
non-quoted in group 2). You should also note that the second group will
capture spaces between the quoted phrases and the non-quoted phrases, as
part of the non-quoted phrase. I know of no way to trim this in the
regular expression itself, so you would have to trim the values from the
matches themselves.

Jun 1 '06 #5
Thank you for you help.
You solved my problem.

Jun 2 '06 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Kenneth McDonald | last post by:
I'm working on the 0.8 release of my 'rex' module, and would appreciate feedback, suggestions, and criticism as I work towards finalizing the API and feature sets. rex is a module intended to make...
4
by: Buddy | last post by:
Can someone please show me how to create a regular expression to do the following My text is set to MyColumn{1, 100} Test I want a regular expression that sets the text to the following...
4
by: Neri | last post by:
Some document processing program I write has to deal with documents that have headers and footers that are unnecessary for the main processing part. Therefore, I'm using a regular expression to go...
11
by: Dimitris Georgakopuolos | last post by:
Hello, I have a text file that I load up to a string. The text includes certain expression like {firstName} or {userName} that I want to match and then replace with a new expression. However,...
3
by: James D. Marshall | last post by:
The issue at hand, I believe is my comprehension of using regular expression, specially to assist in replacing the expression with other text. using regular expression (\s*) my understanding is...
7
by: Billa | last post by:
Hi, I am replaceing a big string using different regular expressions (see some example at the end of the message). The problem is whenever I apply a "replace" it makes a new copy of string and I...
9
by: Pete Davis | last post by:
I'm using regular expressions to extract some data and some links from some web pages. I download the page and then I want to get a list of certain links. For building regular expressions, I use...
25
by: Mike | last post by:
I have a regular expression (^(.+)(?=\s*).*\1 ) that results in matches. I would like to get what the actual regular expression is. In other words, when I apply ^(.+)(?=\s*).*\1 to " HEART...
1
by: Allan Ebdrup | last post by:
I have a dynamic list of regular expressions, the expressions don't change very often but they can change. And I have a single string that I want to match the regular expressions against and find...
1
by: NvrBst | last post by:
I want to use the .replace() method with the regular expression /^ %VAR % =,($|&)/. The following DOESN'T replace the "^default.aspx=,($|&)" regular expression with "":...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.