Connecting Tech Pros Worldwide Forums | Help | Site Map

RegEx problem

=?Utf-8?B?amFj?=
Guest
 
Posts: n/a
#1: Jun 28 '07
Hi,


I have problems with following code and don’t find the bug :

// Set [8,9,54]
ArrayList aArray = new ArrayList();
regStr = new Regex(@"\[(?:(\d+)[,]?)*(\d+)\]");
if(text != null && regStr.IsMatch(text))
{
Match m = regStr.Match(text);
GroupCollection groups = m.Groups;
number = 0;
for(int i=1;i < groups.Count;i++)
{
foreach(Capture c in groups[i].Captures)
{
aArray.Add(c.Value.ToString());
number++;
}
}

}

[8,9] : thats working in my aArray I have 8 and 9
[16,5] : OK I have 16 and 5
[16,34] : That is nok I have 3 items in my array 16 and 3 and 4
[16] : that’s is nok I have 2 items in my array 1 and 6

Why m.groups has 3 groups for [16,34]? The same for [16] why m.groups has 2
groups.
I think it must be the last part of my regex expression (\d+). This is one
group even if there are more numbers in it. How can I solve this?

Thanks in advance,
jac

Christof Nordiek
Guest
 
Posts: n/a
#2: Jun 28 '07

re: RegEx problem


"jac" <jac@discussions.microsoft.comschrieb im Newsbeitrag
news:0078CDB6-40A2-43C6-B0C6-514DB7807487@microsoft.com...
Quote:
Hi,
>
>
I have problems with following code and don't find the bug :
>
// Set [8,9,54]
ArrayList aArray = new ArrayList();
regStr = new Regex(@"\[(?:(\d+)[,]?)*(\d+)\]");
Why the '?' behine '[,]' ?
That allows to match only part of a number and put the rest in the next
number.
And why the brackets around the comma?
That seems souerfluous to me.

Christof


Jesse Houwing
Guest
 
Posts: n/a
#3: Jun 28 '07

re: RegEx problem


* jac wrote, On 28-6-2007 17:26:
Quote:
Hi,
>
>
I have problems with following code and don’t find the bug :
>
// Set [8,9,54]
ArrayList aArray = new ArrayList();
regStr = new Regex(@"\[(?:(\d+)[,]?)*(\d+)\]");
if(text != null && regStr.IsMatch(text))
{
Match m = regStr.Match(text);
GroupCollection groups = m.Groups;
number = 0;
for(int i=1;i < groups.Count;i++)
{
foreach(Capture c in groups[i].Captures)
{
aArray.Add(c.Value.ToString());
number++;
}
}
>
}
>
[8,9] : thats working in my aArray I have 8 and 9
[16,5] : OK I have 16 and 5
[16,34] : That is nok I have 3 items in my array 16 and 3 and 4
[16] : that’s is nok I have 2 items in my array 1 and 6
>
Why m.groups has 3 groups for [16,34]? The same for [16] why m.groups has 2
groups.
I think it must be the last part of my regex expression (\d+). This is one
group even if there are more numbers in it. How can I solve this?
>
Thanks in advance,
jac
>

\[(?<number>\d+)(?:,(?<number>\d+))*\]

should do the trick. Currently there are too many options as both the ,
as well as the whole first group are optional (which they're not).

The new expression reads

find a [
find a number (one or more digits)
optionally find a comma followed by a number
repeat optional group if possible
find a ]

both number are captured in the same named group, which makes it easier
to extract the values:

Match m = regStr.Match(text);
foreach (Capture c in m.Groups["number"].Captures)
{
aArray.Add(c.Value);
}

number = aArray.Count;

Optionally you could also do a string.Split with '[', ',' and ']' as
separator characters which would probably be faster as well. You can
instruct string.Split to ignore empty groups.

string[] results = "[16,23,1]".Split(new char[] { ',', '[', ']' },
StringSplitOptions.RemoveEmptyEntries);
int number = results.Length;

I'd prefer this solution over the regex one.

Jesse
=?Utf-8?B?amFj?=
Guest
 
Posts: n/a
#4: Jun 28 '07

re: RegEx problem


Because I can have 0 or multiple sets of 15,12,5,13, therefore ((\d+)[,]?)
In the set I can have 0 or 1 comma, but I can have the set multiple times
(Example[12,4,56,7,14,25,12]) or not and then I think I fall in the last part
of it (example [45])



"Christof Nordiek" wrote:
Quote:
"jac" <jac@discussions.microsoft.comschrieb im Newsbeitrag
news:0078CDB6-40A2-43C6-B0C6-514DB7807487@microsoft.com...
Quote:
Hi,


I have problems with following code and don't find the bug :

// Set [8,9,54]
ArrayList aArray = new ArrayList();
regStr = new Regex(@"\[(?:(\d+)[,]?)*(\d+)\]");
>
Why the '?' behine '[,]' ?
That allows to match only part of a number and put the rest in the next
number.
And why the brackets around the comma?
That seems souerfluous to me.
>
Christof
>
>
>
=?Utf-8?B?TWFydGluIw==?=
Guest
 
Posts: n/a
#5: Jun 28 '07

re: RegEx problem


Hello,

First, very good and detailed answer! (Got a positive rate from me)

But I would prefere the string.Split solution that you also presented.
A quick test with a loop and two timestamps will show you why!

All the best,

Martin

"Jesse Houwing" wrote:
Quote:
* jac wrote, On 28-6-2007 17:26:
Quote:
Hi,


I have problems with following code and don’t find the bug :

// Set [8,9,54]
ArrayList aArray = new ArrayList();
regStr = new Regex(@"\[(?:(\d+)[,]?)*(\d+)\]");
if(text != null && regStr.IsMatch(text))
{
Match m = regStr.Match(text);
GroupCollection groups = m.Groups;
number = 0;
for(int i=1;i < groups.Count;i++)
{
foreach(Capture c in groups[i].Captures)
{
aArray.Add(c.Value.ToString());
number++;
}
}

}

[8,9] : thats working in my aArray I have 8 and 9
[16,5] : OK I have 16 and 5
[16,34] : That is nok I have 3 items in my array 16 and 3 and 4
[16] : that’s is nok I have 2 items in my array 1 and 6

Why m.groups has 3 groups for [16,34]? The same for [16] why m.groups has 2
groups.
I think it must be the last part of my regex expression (\d+). This is one
group even if there are more numbers in it. How can I solve this?

Thanks in advance,
jac
>
>
\[(?<number>\d+)(?:,(?<number>\d+))*\]
>
should do the trick. Currently there are too many options as both the ,
as well as the whole first group are optional (which they're not).
>
The new expression reads
>
find a [
find a number (one or more digits)
optionally find a comma followed by a number
repeat optional group if possible
find a ]
>
both number are captured in the same named group, which makes it easier
to extract the values:
>
Match m = regStr.Match(text);
foreach (Capture c in m.Groups["number"].Captures)
{
aArray.Add(c.Value);
}
>
number = aArray.Count;
>
Optionally you could also do a string.Split with '[', ',' and ']' as
separator characters which would probably be faster as well. You can
instruct string.Split to ignore empty groups.
>
string[] results = "[16,23,1]".Split(new char[] { ',', '[', ']' },
StringSplitOptions.RemoveEmptyEntries);
int number = results.Length;
>
I'd prefer this solution over the regex one.
>
Jesse
>
=?Utf-8?B?amFj?=
Guest
 
Posts: n/a
#6: Jun 28 '07

re: RegEx problem


Thank you, it works nice and it was a very good description how to read a
regex.


"Jesse Houwing" wrote:
Quote:
* jac wrote, On 28-6-2007 17:26:
Quote:
Hi,


I have problems with following code and don’t find the bug :

// Set [8,9,54]
ArrayList aArray = new ArrayList();
regStr = new Regex(@"\[(?:(\d+)[,]?)*(\d+)\]");
if(text != null && regStr.IsMatch(text))
{
Match m = regStr.Match(text);
GroupCollection groups = m.Groups;
number = 0;
for(int i=1;i < groups.Count;i++)
{
foreach(Capture c in groups[i].Captures)
{
aArray.Add(c.Value.ToString());
number++;
}
}

}

[8,9] : thats working in my aArray I have 8 and 9
[16,5] : OK I have 16 and 5
[16,34] : That is nok I have 3 items in my array 16 and 3 and 4
[16] : that’s is nok I have 2 items in my array 1 and 6

Why m.groups has 3 groups for [16,34]? The same for [16] why m.groups has 2
groups.
I think it must be the last part of my regex expression (\d+). This is one
group even if there are more numbers in it. How can I solve this?

Thanks in advance,
jac
>
>
\[(?<number>\d+)(?:,(?<number>\d+))*\]
>
should do the trick. Currently there are too many options as both the ,
as well as the whole first group are optional (which they're not).
>
The new expression reads
>
find a [
find a number (one or more digits)
optionally find a comma followed by a number
repeat optional group if possible
find a ]
>
both number are captured in the same named group, which makes it easier
to extract the values:
>
Match m = regStr.Match(text);
foreach (Capture c in m.Groups["number"].Captures)
{
aArray.Add(c.Value);
}
>
number = aArray.Count;
>
Optionally you could also do a string.Split with '[', ',' and ']' as
separator characters which would probably be faster as well. You can
instruct string.Split to ignore empty groups.
>
string[] results = "[16,23,1]".Split(new char[] { ',', '[', ']' },
StringSplitOptions.RemoveEmptyEntries);
int number = results.Length;
>
I'd prefer this solution over the regex one.
>
Jesse
>
Jesse Houwing
Guest
 
Posts: n/a
#7: Jun 28 '07

re: RegEx problem


* Martin# wrote, On 28-6-2007 18:40:
Quote:
Hello,
>
First, very good and detailed answer! (Got a positive rate from me)
Thank you :)
Quote:
But I would prefere the string.Split solution that you also presented.
A quick test with a loop and two timestamps will show you why!
I hadn't tested, but my guess is that it's a major difference. Regex can
do beautiful things, but isn't the best tool for every problem. As I
said before: I'd prefer this solution over the regex one. It's both
easier to read, and faster. The only problem is that it doesn't validate
the input while the regex would do that for you.

I'm not sure if a int.TryParse would impact the loop you tried enough to
make is slower than a regex though, my guess is that it's still faster
than a regex.

Jesse
Quote:
All the best,
and to you.

Jesse

Quote:
>
Martin
>
"Jesse Houwing" wrote:
>
Quote:
>* jac wrote, On 28-6-2007 17:26:
Quote:
>>Hi,
>>>
>>>
>>I have problems with following code and don’t find the bug :
>>>
>>// Set [8,9,54]
>>ArrayList aArray = new ArrayList();
>>regStr = new Regex(@"\[(?:(\d+)[,]?)*(\d+)\]");
>>if(text != null && regStr.IsMatch(text))
>>{
>> Match m = regStr.Match(text);
>> GroupCollection groups = m.Groups;
>> number = 0;
>> for(int i=1;i < groups.Count;i++)
>> {
>> foreach(Capture c in groups[i].Captures)
>> {
>> aArray.Add(c.Value.ToString());
>> number++;
>> }
>> }
>>>
>>}
>>>
>>[8,9] : thats working in my aArray I have 8 and 9
>>[16,5] : OK I have 16 and 5
>>[16,34] : That is nok I have 3 items in my array 16 and 3 and 4
>>[16] : that’s is nok I have 2 items in my array 1 and 6
>>>
>>Why m.groups has 3 groups for [16,34]? The same for [16] why m.groups has 2
>>groups.
>>I think it must be the last part of my regex expression (\d+). This is one
>>group even if there are more numbers in it. How can I solve this?
>>>
>>Thanks in advance,
>>jac
>>>
>>
>\[(?<number>\d+)(?:,(?<number>\d+))*\]
>>
>should do the trick. Currently there are too many options as both the ,
>as well as the whole first group are optional (which they're not).
>>
>The new expression reads
>>
>find a [
>find a number (one or more digits)
>optionally find a comma followed by a number
>repeat optional group if possible
>find a ]
>>
>both number are captured in the same named group, which makes it easier
>to extract the values:
>>
>Match m = regStr.Match(text);
>foreach (Capture c in m.Groups["number"].Captures)
>{
> aArray.Add(c.Value);
>}
>>
>number = aArray.Count;
>>
>Optionally you could also do a string.Split with '[', ',' and ']' as
>separator characters which would probably be faster as well. You can
>instruct string.Split to ignore empty groups.
>>
>string[] results = "[16,23,1]".Split(new char[] { ',', '[', ']' },
>StringSplitOptions.RemoveEmptyEntries);
> int number = results.Length;
>>
>I'd prefer this solution over the regex one.
>>
>Jesse
>>
Christof Nordiek
Guest
 
Posts: n/a
#8: Jun 29 '07

re: RegEx problem


"jac" <jac@discussions.microsoft.comschrieb im Newsbeitrag
news:862EA457-C39C-41AF-AD91-5391436DA5FE@microsoft.com...
Quote:
Because I can have 0 or multiple sets of 15,12,5,13, therefore
((\d+)[,]?)
In the set I can have 0 or 1 comma, but I can have the set multiple times
(Example[12,4,56,7,14,25,12]) or not and then I think I fall in the last
part
of it (example [45])
>
But the 45 would simply be the last number, wich is allready in the RegEx
and the privious group, with the comma will be matched zero times.
Actually that's the cause of the fault, the the first part can match, even
if there is no comma.

Christof


Closed Thread