By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,446 Members | 3,120 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,446 IT Pros & Developers. It's quick & easy.

Regex: matching comma separated list?

P: n/a
Bob
I think this is very simple but I am having difficult doing it. Basically
take a comma separated list:
abc, def, ghi, jk

A list with only one token does not have any commas:
abc

The first letter of each token (abc) must not be a number. I am simply
trying to parse it to get an array of tokens:
abc
def
ghi
jk

....or for the single token one:
abc

I can easily do this with String.Replace and String.Split, but would like to
do this with regular expressions. Yet I cannot seem to get it to work, here
is what I have so far:

String input = "abc, def, ghi, jk";
String pattern = @"^((?<name>\D.*?)(\x2C )?)+?$";
Match match = Regex.Match(input, pattern, RegexOptions.ExplicitCapture);

Any input would be appreciated,

Thanks
Nov 17 '05 #1
Share this Question
Share on Google+
5 Replies


P: n/a
I don't think Regular Expressions is the right tool for this job, Bob.
Regular Expressions are used to search for patterns, that is, strings which
share certain characteristics in common, but are not identical. In your
case, you want to convert a comma-delmited string into an array, and
String.Split() does just that.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
A watched clock never boils.

"Bob" <no****@nowhere.com> wrote in message
news:uL**************@TK2MSFTNGP09.phx.gbl...
I think this is very simple but I am having difficult doing it. Basically
take a comma separated list:
abc, def, ghi, jk

A list with only one token does not have any commas:
abc

The first letter of each token (abc) must not be a number. I am simply
trying to parse it to get an array of tokens:
abc
def
ghi
jk

...or for the single token one:
abc

I can easily do this with String.Replace and String.Split, but would like
to do this with regular expressions. Yet I cannot seem to get it to work,
here is what I have so far:

String input = "abc, def, ghi, jk";
String pattern = @"^((?<name>\D.*?)(\x2C )?)+?$";
Match match = Regex.Match(input, pattern, RegexOptions.ExplicitCapture);

Any input would be appreciated,

Thanks

Nov 17 '05 #2

P: n/a
In article <uL**************@TK2MSFTNGP09.phx.gbl>,
Bob <no****@nowhere.com> wrote:

: I think this is very simple but I am having difficult doing it. Basically
: take a comma separated list:
: abc, def, ghi, jk
:
: A list with only one token does not have any commas:
: abc
:
: The first letter of each token (abc) must not be a number. I am simply
: trying to parse it to get an array of tokens:
: abc
: def
: ghi
: jk
:
: ...or for the single token one:
: abc
:
: I can easily do this with String.Replace and String.Split, but would like to
: do this with regular expressions. Yet I cannot seem to get it to work, here
: is what I have so far:
:
: String input = "abc, def, ghi, jk";
: String pattern = @"^((?<name>\D.*?)(\x2C )?)+?$";
: Match match = Regex.Match(input, pattern, RegexOptions.ExplicitCapture);
:
: Any input would be appreciated,

Consider the following code:

static void Main(string[] args)
{
string[] inputs = new string[]
{
"abc, def, ghi, jk",
"abc",
"good, 1bad, good, 2bad",
"trailingcomma,",
",",
",,",
",,,",
};

string pattern =
@"^(
(
| # ignore empties
(?<token>\D.*?) # a token worth keeping
|\d.*? # or one to ignore
)
\s* # eat trailing whitespace
(,\s*|$) # separator or done
)+$ # catch a sequence of the above
";

Regex tokens = new Regex(pattern, RegexOptions.IgnorePatternWhitespace);

foreach (string input in inputs)
{
Match m = tokens.Match(input);

Console.WriteLine("input = [" + input + "]:");
if (m.Success)
{
if (m.Groups["token"].Captures.Count > 0)
foreach (Capture c in m.Groups["token"].Captures)
Console.WriteLine(" - [" + c.Value + "]");
else
Console.WriteLine(" - no captures");
}
else
Console.WriteLine(" - no match.");
}
}

Its output is

input = [abc, def, ghi, jk]:
- [abc]
- [def]
- [ghi]
- [jk]
input = [abc]:
- [abc]
input = [good, 1bad, good, 2bad]:
- [good]
- [good]
input = [trailingcomma,]:
- [trailingcomma]
input = [,]:
- no captures
input = [,,]:
- no captures
input = [,,,]:
- no captures

It's easy to anticipate Jon Skeet's objections to the regular
expression above, and he'd certainly be on solid ground. Passing the
result of a split through a filter would be much clearer, e.g.,

public static void ExtractGoodTokens(string[] inputs)
{
Regex goodtoken = new Regex(@"^\D");

foreach (string input in inputs)
{
ArrayList goodtokens = new ArrayList();

foreach (string token in Regex.Split(input, @"\s*,\s*"))
if (goodtoken.IsMatch(token))
goodtokens.Add(token);

Console.WriteLine("input = [" + input + "]:");
if (goodtokens.Count > 0)
foreach (string token in goodtokens)
Console.WriteLine(" - [" + token + "]");
else
Console.WriteLine(" - none");
}
}

Hope this helps,
Greg
--
I have felt for a long time that a talent for programming consists largely
of the abilty to switch readily from microscopic to macroscopic views of
things, i.e., to change levels of abstraction fluently.
-- Donald E. Knuth, "Structured Programming with go to Statements"
Nov 17 '05 #3

P: n/a
How about

string[] aryList = strList.Split(new char[] {','});

???

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
A watched clock never boils.

"Greg Bacon" <gb****@hiwaay.net> wrote in message
news:11*************@corp.supernews.com...
In article <uL**************@TK2MSFTNGP09.phx.gbl>,
Bob <no****@nowhere.com> wrote:

: I think this is very simple but I am having difficult doing it.
Basically
: take a comma separated list:
: abc, def, ghi, jk
:
: A list with only one token does not have any commas:
: abc
:
: The first letter of each token (abc) must not be a number. I am simply
: trying to parse it to get an array of tokens:
: abc
: def
: ghi
: jk
:
: ...or for the single token one:
: abc
:
: I can easily do this with String.Replace and String.Split, but would
like to
: do this with regular expressions. Yet I cannot seem to get it to work,
here
: is what I have so far:
:
: String input = "abc, def, ghi, jk";
: String pattern = @"^((?<name>\D.*?)(\x2C )?)+?$";
: Match match = Regex.Match(input, pattern, RegexOptions.ExplicitCapture);
:
: Any input would be appreciated,

Consider the following code:

static void Main(string[] args)
{
string[] inputs = new string[]
{
"abc, def, ghi, jk",
"abc",
"good, 1bad, good, 2bad",
"trailingcomma,",
",",
",,",
",,,",
};

string pattern =
@"^(
(
| # ignore empties
(?<token>\D.*?) # a token worth keeping
|\d.*? # or one to ignore
)
\s* # eat trailing whitespace
(,\s*|$) # separator or done
)+$ # catch a sequence of the above
";

Regex tokens = new Regex(pattern,
RegexOptions.IgnorePatternWhitespace);

foreach (string input in inputs)
{
Match m = tokens.Match(input);

Console.WriteLine("input = [" + input + "]:");
if (m.Success)
{
if (m.Groups["token"].Captures.Count > 0)
foreach (Capture c in m.Groups["token"].Captures)
Console.WriteLine(" - [" + c.Value + "]");
else
Console.WriteLine(" - no captures");
}
else
Console.WriteLine(" - no match.");
}
}

Its output is

input = [abc, def, ghi, jk]:
- [abc]
- [def]
- [ghi]
- [jk]
input = [abc]:
- [abc]
input = [good, 1bad, good, 2bad]:
- [good]
- [good]
input = [trailingcomma,]:
- [trailingcomma]
input = [,]:
- no captures
input = [,,]:
- no captures
input = [,,,]:
- no captures

It's easy to anticipate Jon Skeet's objections to the regular
expression above, and he'd certainly be on solid ground. Passing the
result of a split through a filter would be much clearer, e.g.,

public static void ExtractGoodTokens(string[] inputs)
{
Regex goodtoken = new Regex(@"^\D");

foreach (string input in inputs)
{
ArrayList goodtokens = new ArrayList();

foreach (string token in Regex.Split(input, @"\s*,\s*"))
if (goodtoken.IsMatch(token))
goodtokens.Add(token);

Console.WriteLine("input = [" + input + "]:");
if (goodtokens.Count > 0)
foreach (string token in goodtokens)
Console.WriteLine(" - [" + token + "]");
else
Console.WriteLine(" - none");
}
}

Hope this helps,
Greg
--
I have felt for a long time that a talent for programming consists largely
of the abilty to switch readily from microscopic to macroscopic views of
things, i.e., to change levels of abstraction fluently.
-- Donald E. Knuth, "Structured Programming with go to Statements"

Nov 17 '05 #4

P: n/a
On Sun, 30 Oct 2005 20:06:37 -0800, "Bob" <no****@nowhere.com> wrote:
I can easily do this with String.Replace and String.Split, but would like to
do this with regular expressions. Yet I cannot seem to get it to work, here
is what I have so far:

String input = "abc, def, ghi, jk";
String pattern = @"^((?<name>\D.*?)(\x2C )?)+?$";


This pattern is far from what you want.

First of all, it is easy to see that as you start with ^ and end with
$ you will always either match the complete string or nothing at all.

Secondly, Groups doesn't multiple matches, they only store the last
match in the given regular expression match. All ExplicitCapture does
is t make sure (\x2C ) as well as the outer parantheses don't count as
groups. The "name" group will only contain the characters captured on
the last loop.

This leads to the third problem. As the regex is written it will
capture a single character and than simply loop and repeat.

This is how it should be done:
(Using RegexOptions.IgnorePatternWhitespace)

string patternSplit =
@"
(?<=,|^) #The character preceding the match is either a comma or
#the beginning of the string

\D.*? #The string itself should be a non digit follow by
#any number of characters

(?=,|$) #The first character after the match should be , or
#the end of the string
";

This will find all the valid substrings while ignoring those beginning
with a digit.

It will however not make a noise if the string consists of invalid
entries. For example "12abc,def,ghi" will return "def" and "ghi" as
the two matches while just ignoring 12abc.

If you need to validate that the string doesn't contain any invalid
entries, you will have to write a seperate regular expressions that
tries to capture the entire string.

--
Marcus Andrén
Nov 17 '05 #5

P: n/a
Forgot to add, remove the members that start with a number.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
A watched clock never boils.

"Greg Bacon" <gb****@hiwaay.net> wrote in message
news:11*************@corp.supernews.com...
In article <uL**************@TK2MSFTNGP09.phx.gbl>,
Bob <no****@nowhere.com> wrote:

: I think this is very simple but I am having difficult doing it.
Basically
: take a comma separated list:
: abc, def, ghi, jk
:
: A list with only one token does not have any commas:
: abc
:
: The first letter of each token (abc) must not be a number. I am simply
: trying to parse it to get an array of tokens:
: abc
: def
: ghi
: jk
:
: ...or for the single token one:
: abc
:
: I can easily do this with String.Replace and String.Split, but would
like to
: do this with regular expressions. Yet I cannot seem to get it to work,
here
: is what I have so far:
:
: String input = "abc, def, ghi, jk";
: String pattern = @"^((?<name>\D.*?)(\x2C )?)+?$";
: Match match = Regex.Match(input, pattern, RegexOptions.ExplicitCapture);
:
: Any input would be appreciated,

Consider the following code:

static void Main(string[] args)
{
string[] inputs = new string[]
{
"abc, def, ghi, jk",
"abc",
"good, 1bad, good, 2bad",
"trailingcomma,",
",",
",,",
",,,",
};

string pattern =
@"^(
(
| # ignore empties
(?<token>\D.*?) # a token worth keeping
|\d.*? # or one to ignore
)
\s* # eat trailing whitespace
(,\s*|$) # separator or done
)+$ # catch a sequence of the above
";

Regex tokens = new Regex(pattern,
RegexOptions.IgnorePatternWhitespace);

foreach (string input in inputs)
{
Match m = tokens.Match(input);

Console.WriteLine("input = [" + input + "]:");
if (m.Success)
{
if (m.Groups["token"].Captures.Count > 0)
foreach (Capture c in m.Groups["token"].Captures)
Console.WriteLine(" - [" + c.Value + "]");
else
Console.WriteLine(" - no captures");
}
else
Console.WriteLine(" - no match.");
}
}

Its output is

input = [abc, def, ghi, jk]:
- [abc]
- [def]
- [ghi]
- [jk]
input = [abc]:
- [abc]
input = [good, 1bad, good, 2bad]:
- [good]
- [good]
input = [trailingcomma,]:
- [trailingcomma]
input = [,]:
- no captures
input = [,,]:
- no captures
input = [,,,]:
- no captures

It's easy to anticipate Jon Skeet's objections to the regular
expression above, and he'd certainly be on solid ground. Passing the
result of a split through a filter would be much clearer, e.g.,

public static void ExtractGoodTokens(string[] inputs)
{
Regex goodtoken = new Regex(@"^\D");

foreach (string input in inputs)
{
ArrayList goodtokens = new ArrayList();

foreach (string token in Regex.Split(input, @"\s*,\s*"))
if (goodtoken.IsMatch(token))
goodtokens.Add(token);

Console.WriteLine("input = [" + input + "]:");
if (goodtokens.Count > 0)
foreach (string token in goodtokens)
Console.WriteLine(" - [" + token + "]");
else
Console.WriteLine(" - none");
}
}

Hope this helps,
Greg
--
I have felt for a long time that a talent for programming consists largely
of the abilty to switch readily from microscopic to macroscopic views of
things, i.e., to change levels of abstraction fluently.
-- Donald E. Knuth, "Structured Programming with go to Statements"

Nov 17 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.