473,624 Members | 2,439 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Regex: matching comma separated list?

Bob
I think this is very simple but I am having difficult doing it. Basically
take a comma separated list:
abc, def, ghi, jk

A list with only one token does not have any commas:
abc

The first letter of each token (abc) must not be a number. I am simply
trying to parse it to get an array of tokens:
abc
def
ghi
jk

....or for the single token one:
abc

I can easily do this with String.Replace and String.Split, but would like to
do this with regular expressions. Yet I cannot seem to get it to work, here
is what I have so far:

String input = "abc, def, ghi, jk";
String pattern = @"^((?<name>\D. *?)(\x2C )?)+?$";
Match match = Regex.Match(inp ut, pattern, RegexOptions.Ex plicitCapture);

Any input would be appreciated,

Thanks
Nov 17 '05 #1
5 10103
I don't think Regular Expressions is the right tool for this job, Bob.
Regular Expressions are used to search for patterns, that is, strings which
share certain characteristics in common, but are not identical. In your
case, you want to convert a comma-delmited string into an array, and
String.Split() does just that.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
A watched clock never boils.

"Bob" <no****@nowhere .com> wrote in message
news:uL******** ******@TK2MSFTN GP09.phx.gbl...
I think this is very simple but I am having difficult doing it. Basically
take a comma separated list:
abc, def, ghi, jk

A list with only one token does not have any commas:
abc

The first letter of each token (abc) must not be a number. I am simply
trying to parse it to get an array of tokens:
abc
def
ghi
jk

...or for the single token one:
abc

I can easily do this with String.Replace and String.Split, but would like
to do this with regular expressions. Yet I cannot seem to get it to work,
here is what I have so far:

String input = "abc, def, ghi, jk";
String pattern = @"^((?<name>\D. *?)(\x2C )?)+?$";
Match match = Regex.Match(inp ut, pattern, RegexOptions.Ex plicitCapture);

Any input would be appreciated,

Thanks

Nov 17 '05 #2
In article <uL************ **@TK2MSFTNGP09 .phx.gbl>,
Bob <no****@nowhere .com> wrote:

: I think this is very simple but I am having difficult doing it. Basically
: take a comma separated list:
: abc, def, ghi, jk
:
: A list with only one token does not have any commas:
: abc
:
: The first letter of each token (abc) must not be a number. I am simply
: trying to parse it to get an array of tokens:
: abc
: def
: ghi
: jk
:
: ...or for the single token one:
: abc
:
: I can easily do this with String.Replace and String.Split, but would like to
: do this with regular expressions. Yet I cannot seem to get it to work, here
: is what I have so far:
:
: String input = "abc, def, ghi, jk";
: String pattern = @"^((?<name>\D. *?)(\x2C )?)+?$";
: Match match = Regex.Match(inp ut, pattern, RegexOptions.Ex plicitCapture);
:
: Any input would be appreciated,

Consider the following code:

static void Main(string[] args)
{
string[] inputs = new string[]
{
"abc, def, ghi, jk",
"abc",
"good, 1bad, good, 2bad",
"trailingcomma, ",
",",
",,",
",,,",
};

string pattern =
@"^(
(
| # ignore empties
(?<token>\D.*?) # a token worth keeping
|\d.*? # or one to ignore
)
\s* # eat trailing whitespace
(,\s*|$) # separator or done
)+$ # catch a sequence of the above
";

Regex tokens = new Regex(pattern, RegexOptions.Ig norePatternWhit espace);

foreach (string input in inputs)
{
Match m = tokens.Match(in put);

Console.WriteLi ne("input = [" + input + "]:");
if (m.Success)
{
if (m.Groups["token"].Captures.Count > 0)
foreach (Capture c in m.Groups["token"].Captures)
Console.WriteLi ne(" - [" + c.Value + "]");
else
Console.WriteLi ne(" - no captures");
}
else
Console.WriteLi ne(" - no match.");
}
}

Its output is

input = [abc, def, ghi, jk]:
- [abc]
- [def]
- [ghi]
- [jk]
input = [abc]:
- [abc]
input = [good, 1bad, good, 2bad]:
- [good]
- [good]
input = [trailingcomma,]:
- [trailingcomma]
input = [,]:
- no captures
input = [,,]:
- no captures
input = [,,,]:
- no captures

It's easy to anticipate Jon Skeet's objections to the regular
expression above, and he'd certainly be on solid ground. Passing the
result of a split through a filter would be much clearer, e.g.,

public static void ExtractGoodToke ns(string[] inputs)
{
Regex goodtoken = new Regex(@"^\D");

foreach (string input in inputs)
{
ArrayList goodtokens = new ArrayList();

foreach (string token in Regex.Split(inp ut, @"\s*,\s*"))
if (goodtoken.IsMa tch(token))
goodtokens.Add( token);

Console.WriteLi ne("input = [" + input + "]:");
if (goodtokens.Cou nt > 0)
foreach (string token in goodtokens)
Console.WriteLi ne(" - [" + token + "]");
else
Console.WriteLi ne(" - none");
}
}

Hope this helps,
Greg
--
I have felt for a long time that a talent for programming consists largely
of the abilty to switch readily from microscopic to macroscopic views of
things, i.e., to change levels of abstraction fluently.
-- Donald E. Knuth, "Structured Programming with go to Statements"
Nov 17 '05 #3
How about

string[] aryList = strList.Split(n ew char[] {','});

???

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
A watched clock never boils.

"Greg Bacon" <gb****@hiwaay. net> wrote in message
news:11******** *****@corp.supe rnews.com...
In article <uL************ **@TK2MSFTNGP09 .phx.gbl>,
Bob <no****@nowhere .com> wrote:

: I think this is very simple but I am having difficult doing it.
Basically
: take a comma separated list:
: abc, def, ghi, jk
:
: A list with only one token does not have any commas:
: abc
:
: The first letter of each token (abc) must not be a number. I am simply
: trying to parse it to get an array of tokens:
: abc
: def
: ghi
: jk
:
: ...or for the single token one:
: abc
:
: I can easily do this with String.Replace and String.Split, but would
like to
: do this with regular expressions. Yet I cannot seem to get it to work,
here
: is what I have so far:
:
: String input = "abc, def, ghi, jk";
: String pattern = @"^((?<name>\D. *?)(\x2C )?)+?$";
: Match match = Regex.Match(inp ut, pattern, RegexOptions.Ex plicitCapture);
:
: Any input would be appreciated,

Consider the following code:

static void Main(string[] args)
{
string[] inputs = new string[]
{
"abc, def, ghi, jk",
"abc",
"good, 1bad, good, 2bad",
"trailingcomma, ",
",",
",,",
",,,",
};

string pattern =
@"^(
(
| # ignore empties
(?<token>\D.*?) # a token worth keeping
|\d.*? # or one to ignore
)
\s* # eat trailing whitespace
(,\s*|$) # separator or done
)+$ # catch a sequence of the above
";

Regex tokens = new Regex(pattern,
RegexOptions.Ig norePatternWhit espace);

foreach (string input in inputs)
{
Match m = tokens.Match(in put);

Console.WriteLi ne("input = [" + input + "]:");
if (m.Success)
{
if (m.Groups["token"].Captures.Count > 0)
foreach (Capture c in m.Groups["token"].Captures)
Console.WriteLi ne(" - [" + c.Value + "]");
else
Console.WriteLi ne(" - no captures");
}
else
Console.WriteLi ne(" - no match.");
}
}

Its output is

input = [abc, def, ghi, jk]:
- [abc]
- [def]
- [ghi]
- [jk]
input = [abc]:
- [abc]
input = [good, 1bad, good, 2bad]:
- [good]
- [good]
input = [trailingcomma,]:
- [trailingcomma]
input = [,]:
- no captures
input = [,,]:
- no captures
input = [,,,]:
- no captures

It's easy to anticipate Jon Skeet's objections to the regular
expression above, and he'd certainly be on solid ground. Passing the
result of a split through a filter would be much clearer, e.g.,

public static void ExtractGoodToke ns(string[] inputs)
{
Regex goodtoken = new Regex(@"^\D");

foreach (string input in inputs)
{
ArrayList goodtokens = new ArrayList();

foreach (string token in Regex.Split(inp ut, @"\s*,\s*"))
if (goodtoken.IsMa tch(token))
goodtokens.Add( token);

Console.WriteLi ne("input = [" + input + "]:");
if (goodtokens.Cou nt > 0)
foreach (string token in goodtokens)
Console.WriteLi ne(" - [" + token + "]");
else
Console.WriteLi ne(" - none");
}
}

Hope this helps,
Greg
--
I have felt for a long time that a talent for programming consists largely
of the abilty to switch readily from microscopic to macroscopic views of
things, i.e., to change levels of abstraction fluently.
-- Donald E. Knuth, "Structured Programming with go to Statements"

Nov 17 '05 #4
On Sun, 30 Oct 2005 20:06:37 -0800, "Bob" <no****@nowhere .com> wrote:
I can easily do this with String.Replace and String.Split, but would like to
do this with regular expressions. Yet I cannot seem to get it to work, here
is what I have so far:

String input = "abc, def, ghi, jk";
String pattern = @"^((?<name>\D. *?)(\x2C )?)+?$";


This pattern is far from what you want.

First of all, it is easy to see that as you start with ^ and end with
$ you will always either match the complete string or nothing at all.

Secondly, Groups doesn't multiple matches, they only store the last
match in the given regular expression match. All ExplicitCapture does
is t make sure (\x2C ) as well as the outer parantheses don't count as
groups. The "name" group will only contain the characters captured on
the last loop.

This leads to the third problem. As the regex is written it will
capture a single character and than simply loop and repeat.

This is how it should be done:
(Using RegexOptions.Ig norePatternWhit espace)

string patternSplit =
@"
(?<=,|^) #The character preceding the match is either a comma or
#the beginning of the string

\D.*? #The string itself should be a non digit follow by
#any number of characters

(?=,|$) #The first character after the match should be , or
#the end of the string
";

This will find all the valid substrings while ignoring those beginning
with a digit.

It will however not make a noise if the string consists of invalid
entries. For example "12abc,def, ghi" will return "def" and "ghi" as
the two matches while just ignoring 12abc.

If you need to validate that the string doesn't contain any invalid
entries, you will have to write a seperate regular expressions that
tries to capture the entire string.

--
Marcus Andrén
Nov 17 '05 #5
Forgot to add, remove the members that start with a number.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
A watched clock never boils.

"Greg Bacon" <gb****@hiwaay. net> wrote in message
news:11******** *****@corp.supe rnews.com...
In article <uL************ **@TK2MSFTNGP09 .phx.gbl>,
Bob <no****@nowhere .com> wrote:

: I think this is very simple but I am having difficult doing it.
Basically
: take a comma separated list:
: abc, def, ghi, jk
:
: A list with only one token does not have any commas:
: abc
:
: The first letter of each token (abc) must not be a number. I am simply
: trying to parse it to get an array of tokens:
: abc
: def
: ghi
: jk
:
: ...or for the single token one:
: abc
:
: I can easily do this with String.Replace and String.Split, but would
like to
: do this with regular expressions. Yet I cannot seem to get it to work,
here
: is what I have so far:
:
: String input = "abc, def, ghi, jk";
: String pattern = @"^((?<name>\D. *?)(\x2C )?)+?$";
: Match match = Regex.Match(inp ut, pattern, RegexOptions.Ex plicitCapture);
:
: Any input would be appreciated,

Consider the following code:

static void Main(string[] args)
{
string[] inputs = new string[]
{
"abc, def, ghi, jk",
"abc",
"good, 1bad, good, 2bad",
"trailingcomma, ",
",",
",,",
",,,",
};

string pattern =
@"^(
(
| # ignore empties
(?<token>\D.*?) # a token worth keeping
|\d.*? # or one to ignore
)
\s* # eat trailing whitespace
(,\s*|$) # separator or done
)+$ # catch a sequence of the above
";

Regex tokens = new Regex(pattern,
RegexOptions.Ig norePatternWhit espace);

foreach (string input in inputs)
{
Match m = tokens.Match(in put);

Console.WriteLi ne("input = [" + input + "]:");
if (m.Success)
{
if (m.Groups["token"].Captures.Count > 0)
foreach (Capture c in m.Groups["token"].Captures)
Console.WriteLi ne(" - [" + c.Value + "]");
else
Console.WriteLi ne(" - no captures");
}
else
Console.WriteLi ne(" - no match.");
}
}

Its output is

input = [abc, def, ghi, jk]:
- [abc]
- [def]
- [ghi]
- [jk]
input = [abc]:
- [abc]
input = [good, 1bad, good, 2bad]:
- [good]
- [good]
input = [trailingcomma,]:
- [trailingcomma]
input = [,]:
- no captures
input = [,,]:
- no captures
input = [,,,]:
- no captures

It's easy to anticipate Jon Skeet's objections to the regular
expression above, and he'd certainly be on solid ground. Passing the
result of a split through a filter would be much clearer, e.g.,

public static void ExtractGoodToke ns(string[] inputs)
{
Regex goodtoken = new Regex(@"^\D");

foreach (string input in inputs)
{
ArrayList goodtokens = new ArrayList();

foreach (string token in Regex.Split(inp ut, @"\s*,\s*"))
if (goodtoken.IsMa tch(token))
goodtokens.Add( token);

Console.WriteLi ne("input = [" + input + "]:");
if (goodtokens.Cou nt > 0)
foreach (string token in goodtokens)
Console.WriteLi ne(" - [" + token + "]");
else
Console.WriteLi ne(" - none");
}
}

Hope this helps,
Greg
--
I have felt for a long time that a talent for programming consists largely
of the abilty to switch readily from microscopic to macroscopic views of
things, i.e., to change levels of abstraction fluently.
-- Donald E. Knuth, "Structured Programming with go to Statements"

Nov 17 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
2217
by: Craig Keightley | last post by:
I can do the match perfectly but what i also need to do is create a third list of comma separated values that are in both eg: List 1 => 1,2,3,4,5,6,7,8,11 List 2 => 1,3,4,5,6,7,10,23 Therefore
3
2400
by: Alan Pretre | last post by:
Can anyone help me figure out a regex pattern for the following input example: xxx:a=b,c=d,yyy:e=f,zzz:www:g=h,i=j,l=m I would want four matches from this: 1. xxx a=b,c=d 2. yyy e=f 3. zzz (empty) 4. www g=h,i=j,l=m
11
2526
by: Craig Keightley | last post by:
I have a mysql database with a list of companies who supply specific products tblSuppliers (simplified) sID | sName | goodsRefs 1 | comp name | 1,2,3,4,5 2 | company 2 | 2,4
7
1527
by: CB | last post by:
Trying to match the entire following object literal code using a RegEx. var Punctuators = { '{' : 'LeftCurly', '}' : 'RightCurly' } Variations on the idea of using /var.*{.*}/ of course stops at the first }. Any ideas? Thanks in advance.
11
3093
by: Steve | last post by:
Hi All, I'm having a tough time converting the following regex.compile patterns into the new re.compile format. There is also a differences in the regsub.sub() vs. re.sub() Could anyone lend a hand? import regsub
1
2670
by: Chris Newman | last post by:
I am working on a script to process a large number of old electoral records. There are about 100,000 records in all but here is a representative sample BTW hd =household duties ALLISON, Winifred hd BRACKENREG, Helen & James hd & lands officer MARSHALL, Margaret, Charles & Herbert hd, ganger & tractor driver
0
1538
by: Tidane | last post by:
Visual Basic.NET Framework 2.0 I've created a program to parse out text as the program recieved it and use Regex matching to decide what should be done. My problem is that the text is matching when it shouldn't be, if that makes any sense. If Regex.IsMatch(Text, "You find (a|an)" & MoneyMatch) Then Other code here that doesn't matter. ElseIf Regex.IsMatch(Text, "(+\s)obtains (a|an)") Then More code that doesn't matter. EndIf
4
2432
by: sherifffruitfly | last post by:
Hi all, I can't see what's wrong with this regex pattern: private int ParsePageViews(string str) { int ret = 0; string pattern = @"Visits.*\n\s*Total\s\.*\s(? <visits>(\d{3})|(\d,\d{3}))";
10
1863
by: bullockbefriending bard | last post by:
first, regex part: I am new to regexes and have come up with the following expression: ((1|),(1|)/){5}(1|),(1|) to exactly match strings which look like this: 1,2/3,4/5,6/7,8/9,10/11,12 i.e. 6 comma-delimited pairs of integer numbers separated by the
0
8680
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8625
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8336
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8482
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7168
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6111
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
4082
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
2610
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
1487
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.