473,396 Members | 1,799 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Regex: matching comma separated list?

Bob
I think this is very simple but I am having difficult doing it. Basically
take a comma separated list:
abc, def, ghi, jk

A list with only one token does not have any commas:
abc

The first letter of each token (abc) must not be a number. I am simply
trying to parse it to get an array of tokens:
abc
def
ghi
jk

....or for the single token one:
abc

I can easily do this with String.Replace and String.Split, but would like to
do this with regular expressions. Yet I cannot seem to get it to work, here
is what I have so far:

String input = "abc, def, ghi, jk";
String pattern = @"^((?<name>\D.*?)(\x2C )?)+?$";
Match match = Regex.Match(input, pattern, RegexOptions.ExplicitCapture);

Any input would be appreciated,

Thanks
Nov 17 '05 #1
5 10047
I don't think Regular Expressions is the right tool for this job, Bob.
Regular Expressions are used to search for patterns, that is, strings which
share certain characteristics in common, but are not identical. In your
case, you want to convert a comma-delmited string into an array, and
String.Split() does just that.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
A watched clock never boils.

"Bob" <no****@nowhere.com> wrote in message
news:uL**************@TK2MSFTNGP09.phx.gbl...
I think this is very simple but I am having difficult doing it. Basically
take a comma separated list:
abc, def, ghi, jk

A list with only one token does not have any commas:
abc

The first letter of each token (abc) must not be a number. I am simply
trying to parse it to get an array of tokens:
abc
def
ghi
jk

...or for the single token one:
abc

I can easily do this with String.Replace and String.Split, but would like
to do this with regular expressions. Yet I cannot seem to get it to work,
here is what I have so far:

String input = "abc, def, ghi, jk";
String pattern = @"^((?<name>\D.*?)(\x2C )?)+?$";
Match match = Regex.Match(input, pattern, RegexOptions.ExplicitCapture);

Any input would be appreciated,

Thanks

Nov 17 '05 #2
In article <uL**************@TK2MSFTNGP09.phx.gbl>,
Bob <no****@nowhere.com> wrote:

: I think this is very simple but I am having difficult doing it. Basically
: take a comma separated list:
: abc, def, ghi, jk
:
: A list with only one token does not have any commas:
: abc
:
: The first letter of each token (abc) must not be a number. I am simply
: trying to parse it to get an array of tokens:
: abc
: def
: ghi
: jk
:
: ...or for the single token one:
: abc
:
: I can easily do this with String.Replace and String.Split, but would like to
: do this with regular expressions. Yet I cannot seem to get it to work, here
: is what I have so far:
:
: String input = "abc, def, ghi, jk";
: String pattern = @"^((?<name>\D.*?)(\x2C )?)+?$";
: Match match = Regex.Match(input, pattern, RegexOptions.ExplicitCapture);
:
: Any input would be appreciated,

Consider the following code:

static void Main(string[] args)
{
string[] inputs = new string[]
{
"abc, def, ghi, jk",
"abc",
"good, 1bad, good, 2bad",
"trailingcomma,",
",",
",,",
",,,",
};

string pattern =
@"^(
(
| # ignore empties
(?<token>\D.*?) # a token worth keeping
|\d.*? # or one to ignore
)
\s* # eat trailing whitespace
(,\s*|$) # separator or done
)+$ # catch a sequence of the above
";

Regex tokens = new Regex(pattern, RegexOptions.IgnorePatternWhitespace);

foreach (string input in inputs)
{
Match m = tokens.Match(input);

Console.WriteLine("input = [" + input + "]:");
if (m.Success)
{
if (m.Groups["token"].Captures.Count > 0)
foreach (Capture c in m.Groups["token"].Captures)
Console.WriteLine(" - [" + c.Value + "]");
else
Console.WriteLine(" - no captures");
}
else
Console.WriteLine(" - no match.");
}
}

Its output is

input = [abc, def, ghi, jk]:
- [abc]
- [def]
- [ghi]
- [jk]
input = [abc]:
- [abc]
input = [good, 1bad, good, 2bad]:
- [good]
- [good]
input = [trailingcomma,]:
- [trailingcomma]
input = [,]:
- no captures
input = [,,]:
- no captures
input = [,,,]:
- no captures

It's easy to anticipate Jon Skeet's objections to the regular
expression above, and he'd certainly be on solid ground. Passing the
result of a split through a filter would be much clearer, e.g.,

public static void ExtractGoodTokens(string[] inputs)
{
Regex goodtoken = new Regex(@"^\D");

foreach (string input in inputs)
{
ArrayList goodtokens = new ArrayList();

foreach (string token in Regex.Split(input, @"\s*,\s*"))
if (goodtoken.IsMatch(token))
goodtokens.Add(token);

Console.WriteLine("input = [" + input + "]:");
if (goodtokens.Count > 0)
foreach (string token in goodtokens)
Console.WriteLine(" - [" + token + "]");
else
Console.WriteLine(" - none");
}
}

Hope this helps,
Greg
--
I have felt for a long time that a talent for programming consists largely
of the abilty to switch readily from microscopic to macroscopic views of
things, i.e., to change levels of abstraction fluently.
-- Donald E. Knuth, "Structured Programming with go to Statements"
Nov 17 '05 #3
How about

string[] aryList = strList.Split(new char[] {','});

???

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
A watched clock never boils.

"Greg Bacon" <gb****@hiwaay.net> wrote in message
news:11*************@corp.supernews.com...
In article <uL**************@TK2MSFTNGP09.phx.gbl>,
Bob <no****@nowhere.com> wrote:

: I think this is very simple but I am having difficult doing it.
Basically
: take a comma separated list:
: abc, def, ghi, jk
:
: A list with only one token does not have any commas:
: abc
:
: The first letter of each token (abc) must not be a number. I am simply
: trying to parse it to get an array of tokens:
: abc
: def
: ghi
: jk
:
: ...or for the single token one:
: abc
:
: I can easily do this with String.Replace and String.Split, but would
like to
: do this with regular expressions. Yet I cannot seem to get it to work,
here
: is what I have so far:
:
: String input = "abc, def, ghi, jk";
: String pattern = @"^((?<name>\D.*?)(\x2C )?)+?$";
: Match match = Regex.Match(input, pattern, RegexOptions.ExplicitCapture);
:
: Any input would be appreciated,

Consider the following code:

static void Main(string[] args)
{
string[] inputs = new string[]
{
"abc, def, ghi, jk",
"abc",
"good, 1bad, good, 2bad",
"trailingcomma,",
",",
",,",
",,,",
};

string pattern =
@"^(
(
| # ignore empties
(?<token>\D.*?) # a token worth keeping
|\d.*? # or one to ignore
)
\s* # eat trailing whitespace
(,\s*|$) # separator or done
)+$ # catch a sequence of the above
";

Regex tokens = new Regex(pattern,
RegexOptions.IgnorePatternWhitespace);

foreach (string input in inputs)
{
Match m = tokens.Match(input);

Console.WriteLine("input = [" + input + "]:");
if (m.Success)
{
if (m.Groups["token"].Captures.Count > 0)
foreach (Capture c in m.Groups["token"].Captures)
Console.WriteLine(" - [" + c.Value + "]");
else
Console.WriteLine(" - no captures");
}
else
Console.WriteLine(" - no match.");
}
}

Its output is

input = [abc, def, ghi, jk]:
- [abc]
- [def]
- [ghi]
- [jk]
input = [abc]:
- [abc]
input = [good, 1bad, good, 2bad]:
- [good]
- [good]
input = [trailingcomma,]:
- [trailingcomma]
input = [,]:
- no captures
input = [,,]:
- no captures
input = [,,,]:
- no captures

It's easy to anticipate Jon Skeet's objections to the regular
expression above, and he'd certainly be on solid ground. Passing the
result of a split through a filter would be much clearer, e.g.,

public static void ExtractGoodTokens(string[] inputs)
{
Regex goodtoken = new Regex(@"^\D");

foreach (string input in inputs)
{
ArrayList goodtokens = new ArrayList();

foreach (string token in Regex.Split(input, @"\s*,\s*"))
if (goodtoken.IsMatch(token))
goodtokens.Add(token);

Console.WriteLine("input = [" + input + "]:");
if (goodtokens.Count > 0)
foreach (string token in goodtokens)
Console.WriteLine(" - [" + token + "]");
else
Console.WriteLine(" - none");
}
}

Hope this helps,
Greg
--
I have felt for a long time that a talent for programming consists largely
of the abilty to switch readily from microscopic to macroscopic views of
things, i.e., to change levels of abstraction fluently.
-- Donald E. Knuth, "Structured Programming with go to Statements"

Nov 17 '05 #4
On Sun, 30 Oct 2005 20:06:37 -0800, "Bob" <no****@nowhere.com> wrote:
I can easily do this with String.Replace and String.Split, but would like to
do this with regular expressions. Yet I cannot seem to get it to work, here
is what I have so far:

String input = "abc, def, ghi, jk";
String pattern = @"^((?<name>\D.*?)(\x2C )?)+?$";


This pattern is far from what you want.

First of all, it is easy to see that as you start with ^ and end with
$ you will always either match the complete string or nothing at all.

Secondly, Groups doesn't multiple matches, they only store the last
match in the given regular expression match. All ExplicitCapture does
is t make sure (\x2C ) as well as the outer parantheses don't count as
groups. The "name" group will only contain the characters captured on
the last loop.

This leads to the third problem. As the regex is written it will
capture a single character and than simply loop and repeat.

This is how it should be done:
(Using RegexOptions.IgnorePatternWhitespace)

string patternSplit =
@"
(?<=,|^) #The character preceding the match is either a comma or
#the beginning of the string

\D.*? #The string itself should be a non digit follow by
#any number of characters

(?=,|$) #The first character after the match should be , or
#the end of the string
";

This will find all the valid substrings while ignoring those beginning
with a digit.

It will however not make a noise if the string consists of invalid
entries. For example "12abc,def,ghi" will return "def" and "ghi" as
the two matches while just ignoring 12abc.

If you need to validate that the string doesn't contain any invalid
entries, you will have to write a seperate regular expressions that
tries to capture the entire string.

--
Marcus Andrén
Nov 17 '05 #5
Forgot to add, remove the members that start with a number.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
A watched clock never boils.

"Greg Bacon" <gb****@hiwaay.net> wrote in message
news:11*************@corp.supernews.com...
In article <uL**************@TK2MSFTNGP09.phx.gbl>,
Bob <no****@nowhere.com> wrote:

: I think this is very simple but I am having difficult doing it.
Basically
: take a comma separated list:
: abc, def, ghi, jk
:
: A list with only one token does not have any commas:
: abc
:
: The first letter of each token (abc) must not be a number. I am simply
: trying to parse it to get an array of tokens:
: abc
: def
: ghi
: jk
:
: ...or for the single token one:
: abc
:
: I can easily do this with String.Replace and String.Split, but would
like to
: do this with regular expressions. Yet I cannot seem to get it to work,
here
: is what I have so far:
:
: String input = "abc, def, ghi, jk";
: String pattern = @"^((?<name>\D.*?)(\x2C )?)+?$";
: Match match = Regex.Match(input, pattern, RegexOptions.ExplicitCapture);
:
: Any input would be appreciated,

Consider the following code:

static void Main(string[] args)
{
string[] inputs = new string[]
{
"abc, def, ghi, jk",
"abc",
"good, 1bad, good, 2bad",
"trailingcomma,",
",",
",,",
",,,",
};

string pattern =
@"^(
(
| # ignore empties
(?<token>\D.*?) # a token worth keeping
|\d.*? # or one to ignore
)
\s* # eat trailing whitespace
(,\s*|$) # separator or done
)+$ # catch a sequence of the above
";

Regex tokens = new Regex(pattern,
RegexOptions.IgnorePatternWhitespace);

foreach (string input in inputs)
{
Match m = tokens.Match(input);

Console.WriteLine("input = [" + input + "]:");
if (m.Success)
{
if (m.Groups["token"].Captures.Count > 0)
foreach (Capture c in m.Groups["token"].Captures)
Console.WriteLine(" - [" + c.Value + "]");
else
Console.WriteLine(" - no captures");
}
else
Console.WriteLine(" - no match.");
}
}

Its output is

input = [abc, def, ghi, jk]:
- [abc]
- [def]
- [ghi]
- [jk]
input = [abc]:
- [abc]
input = [good, 1bad, good, 2bad]:
- [good]
- [good]
input = [trailingcomma,]:
- [trailingcomma]
input = [,]:
- no captures
input = [,,]:
- no captures
input = [,,,]:
- no captures

It's easy to anticipate Jon Skeet's objections to the regular
expression above, and he'd certainly be on solid ground. Passing the
result of a split through a filter would be much clearer, e.g.,

public static void ExtractGoodTokens(string[] inputs)
{
Regex goodtoken = new Regex(@"^\D");

foreach (string input in inputs)
{
ArrayList goodtokens = new ArrayList();

foreach (string token in Regex.Split(input, @"\s*,\s*"))
if (goodtoken.IsMatch(token))
goodtokens.Add(token);

Console.WriteLine("input = [" + input + "]:");
if (goodtokens.Count > 0)
foreach (string token in goodtokens)
Console.WriteLine(" - [" + token + "]");
else
Console.WriteLine(" - none");
}
}

Hope this helps,
Greg
--
I have felt for a long time that a talent for programming consists largely
of the abilty to switch readily from microscopic to macroscopic views of
things, i.e., to change levels of abstraction fluently.
-- Donald E. Knuth, "Structured Programming with go to Statements"

Nov 17 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Craig Keightley | last post by:
I can do the match perfectly but what i also need to do is create a third list of comma separated values that are in both eg: List 1 => 1,2,3,4,5,6,7,8,11 List 2 => 1,3,4,5,6,7,10,23 ...
3
by: Alan Pretre | last post by:
Can anyone help me figure out a regex pattern for the following input example: xxx:a=b,c=d,yyy:e=f,zzz:www:g=h,i=j,l=m I would want four matches from this: 1. xxx a=b,c=d 2. yyy e=f 3....
11
by: Craig Keightley | last post by:
I have a mysql database with a list of companies who supply specific products tblSuppliers (simplified) sID | sName | goodsRefs 1 | comp name | 1,2,3,4,5 2 | company 2 | 2,4
7
by: CB | last post by:
Trying to match the entire following object literal code using a RegEx. var Punctuators = { '{' : 'LeftCurly', '}' : 'RightCurly' } Variations on the idea of using /var.*{.*}/ of course stops...
11
by: Steve | last post by:
Hi All, I'm having a tough time converting the following regex.compile patterns into the new re.compile format. There is also a differences in the regsub.sub() vs. re.sub() Could anyone lend...
1
by: Chris Newman | last post by:
I am working on a script to process a large number of old electoral records. There are about 100,000 records in all but here is a representative sample BTW hd =household duties ALLISON,...
0
by: Tidane | last post by:
Visual Basic.NET Framework 2.0 I've created a program to parse out text as the program recieved it and use Regex matching to decide what should be done. My problem is that the text is matching when...
4
by: sherifffruitfly | last post by:
Hi all, I can't see what's wrong with this regex pattern: private int ParsePageViews(string str) { int ret = 0; string pattern = @"Visits.*\n\s*Total\s\.*\s(? <visits>(\d{3})|(\d,\d{3}))";
10
by: bullockbefriending bard | last post by:
first, regex part: I am new to regexes and have come up with the following expression: ((1|),(1|)/){5}(1|),(1|) to exactly match strings which look like this: 1,2/3,4/5,6/7,8/9,10/11,12 ...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.