By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
444,190 Members | 1,503 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 444,190 IT Pros & Developers. It's quick & easy.

Regex repeating capture

P: n/a
Howdy,

I'm trying to break an input string into multpile pieces using a
series of delimiters that start with an asterisk. Following the
asterisk is a mulitple character identifier immediately followed by a
data string of variable length. The input string may contain more than
one identifier anywhere in the string.

Here is an example:
*CZ1 2.3 4-56 *fuuuS24364 08 23 72

I'd like to break this into
CZ
1 2.3 4-56
fuuu
S24364 08 23 72

I have tried the pattern (?:\*(CZ|fuuu)(.*)), which produces the
following ouput:
CZ
1 2.3 4-56 *fuuuS24364 08 23 72

How can I force it to repeat the capturing?

Thanks,
Jay

Jan 30 '07 #1
Share this Question
Share on Google+
7 Replies


P: n/a
Use:

public string[] Split (
params char[] separator
)
to split your string on the asterisk as a first step.

Now you can enumerate over the string array splitting out your identifiers
and data strings. You could use a StringBuilder to build what ever you want
to output.

Now you can use:

public bool StartsWith (
string value
)andpublic string Substring (
int startIndex
)e.g.StringBuilder sb = new StringBuilder();
foreach (string s in strArray)
{
if (s.StartsWith("CZ")
{
sb.Append("CZ");
sb.Append(s.Substring(2));
}
else
{
sb.Append("fuuu");
sb.Append(s.Substring(4))
}
}

return sb.ToString();

I'm sure there's an easier way using a Regex, but I can't be bothered to
puzzle it out.

HTH
Peter

<ja*******@gmail.comwrote in message
news:11*********************@h3g2000cwc.googlegrou ps.com...
Howdy,

I'm trying to break an input string into multpile pieces using a
series of delimiters that start with an asterisk. Following the
asterisk is a mulitple character identifier immediately followed by a
data string of variable length. The input string may contain more than
one identifier anywhere in the string.

Here is an example:
*CZ1 2.3 4-56 *fuuuS24364 08 23 72

I'd like to break this into
CZ
1 2.3 4-56
fuuu
S24364 08 23 72

I have tried the pattern (?:\*(CZ|fuuu)(.*)), which produces the
following ouput:
CZ
1 2.3 4-56 *fuuuS24364 08 23 72

How can I force it to repeat the capturing?

Thanks,
Jay

Jan 30 '07 #2

P: n/a
Sorry, this was a simple example. In all, there are 50+ identifiers
and * is allowed in that data as long it isn't immediately followed by
an identifier, otherwise it is considered another identifier.

On Jan 30, 11:58 am, "Peter Bradley" <pbrad...@uwic.ac.ukwrote:
Use:

public string[] Split (
params char[] separator
)
to split your string on the asterisk as a first step.

Now you can enumerate over the string array splitting out your identifiers
and data strings. You could use a StringBuilder to build what ever you want
to output.

Now you can use:

public bool StartsWith (
string value
)andpublic string Substring (
int startIndex
)e.g.StringBuilder sb = new StringBuilder();
foreach (string s in strArray)
{
if (s.StartsWith("CZ")
{
sb.Append("CZ");
sb.Append(s.Substring(2));
}
else
{
sb.Append("fuuu");
sb.Append(s.Substring(4))
}

}return sb.ToString();

I'm sure there's an easier way using a Regex, but I can't be bothered to
puzzle it out.

HTH

Peter

<jayluc...@gmail.comwrote in messagenews:11*********************@h3g2000cwc.goo glegroups.com...
Howdy,
I'm trying to break an input string into multpile pieces using a
series of delimiters that start with an asterisk. Following the
asterisk is a mulitple character identifier immediately followed by a
data string of variable length. The input string may contain more than
one identifier anywhere in the string.
Here is an example:
*CZ1 2.3 4-56 *fuuuS24364 08 23 72
I'd like to break this into
CZ
1 2.3 4-56
fuuu
S24364 08 23 72
I have tried the pattern (?:\*(CZ|fuuu)(.*)), which produces the
following ouput:
CZ
1 2.3 4-56 *fuuuS24364 08 23 72
How can I force it to repeat the capturing?
Thanks,
Jay
Jan 30 '07 #3

P: n/a


<ja*******@gmail.comwrote in message
news:11*********************@h3g2000cwc.googlegrou ps.com...
Howdy,

I'm trying to break an input string into multpile pieces using a
series of delimiters that start with an asterisk. Following the
asterisk is a mulitple character identifier immediately followed by a
data string of variable length. The input string may contain more than
one identifier anywhere in the string.

Here is an example:
*CZ1 2.3 4-56 *fuuuS24364 08 23 72

I'd like to break this into
CZ
1 2.3 4-56
fuuu
S24364 08 23 72

I have tried the pattern (?:\*(CZ|fuuu)(.*)), which produces the
following ouput:
CZ
1 2.3 4-56 *fuuuS24364 08 23 72

How can I force it to repeat the capturing?

Thanks,
Jay
So, to split based on an * using a regular expression:

string pattern = @"\*(?<Text>[^\*]+)";
string input = "*CZ1 2.3 4-56 *fuuuS24364 08 23 72";
Match match = Regex.Match(input, pattern);

while (match.Success) {
Console.WriteLine(match.Groups["Text"].Value);
match = match.NextMatch();
}

HTH,
Mythran


Jan 30 '07 #4

P: n/a


"Mythran" <ki********@hotmail.comwrote in message
news:40**********************************@microsof t.com...
>

<ja*******@gmail.comwrote in message
news:11*********************@h3g2000cwc.googlegrou ps.com...
>Howdy,

I'm trying to break an input string into multpile pieces using a
series of delimiters that start with an asterisk. Following the
asterisk is a mulitple character identifier immediately followed by a
data string of variable length. The input string may contain more than
one identifier anywhere in the string.

Here is an example:
*CZ1 2.3 4-56 *fuuuS24364 08 23 72

I'd like to break this into
CZ
1 2.3 4-56
fuuu
S24364 08 23 72

I have tried the pattern (?:\*(CZ|fuuu)(.*)), which produces the
following ouput:
CZ
1 2.3 4-56 *fuuuS24364 08 23 72

How can I force it to repeat the capturing?

Thanks,
Jay

So, to split based on an * using a regular expression:

string pattern = @"\*(?<Text>[^\*]+)";
string input = "*CZ1 2.3 4-56 *fuuuS24364 08 23 72";
Match match = Regex.Match(input, pattern);

while (match.Success) {
Console.WriteLine(match.Groups["Text"].Value);
match = match.NextMatch();
}

HTH,
Mythran

ahh, I didn't know you wanted to break it out into identifier, text,
identifier, text...thus the previous post should be obliterated :P...do you
know if the identifier is always 4 characters? Hope so, the following
example shows how to achieve this:

string pattern = @"\*(?<Identifier>.{4})(?<Value>[^\*]+)";
string input = "*CZ1 2.3 4-56 *fuuuS24364 08 23 72";
Match match = Regex.Match(input, pattern);

while (match.Success) {
Console.WriteLine(
"Identifier: {0} - Value: {1}",
match.Groups["Identifier"].Value,
match.Groups["Value"].Value
);
match = match.NextMatch();
}

HTH,
Mythran
Jan 30 '07 #5

P: n/a
Jay
The identifier is at least 2 character, but has no upper limit.
Thanks,
Jay

On Jan 30, 12:36 pm, "Mythran" <kip_pot...@hotmail.comwrote:
"Mythran" <kip_pot...@hotmail.comwrote in messagenews:40**********************************@m icrosoft.com...


<jayluc...@gmail.comwrote in message
news:11*********************@h3g2000cwc.googlegrou ps.com...
Howdy,
I'm trying to break an input string into multpile pieces using a
series of delimiters that start with an asterisk. Following the
asterisk is a mulitple character identifier immediately followed by a
data string of variable length. The input string may contain more than
one identifier anywhere in the string.
Here is an example:
*CZ1 2.3 4-56 *fuuuS24364 08 23 72
I'd like to break this into
CZ
1 2.3 4-56
fuuu
S24364 08 23 72
I have tried the pattern (?:\*(CZ|fuuu)(.*)), which produces the
following ouput:
CZ
1 2.3 4-56 *fuuuS24364 08 23 72
How can I force it to repeat the capturing?
Thanks,
Jay
So, to split based on an * using a regular expression:
string pattern = @"\*(?<Text>[^\*]+)";
string input = "*CZ1 2.3 4-56 *fuuuS24364 08 23 72";
Match match = Regex.Match(input, pattern);
while (match.Success) {
Console.WriteLine(match.Groups["Text"].Value);
match = match.NextMatch();
}
HTH,
Mythranahh, I didn't know you wanted to break it out into identifier, text,
identifier, text...thus the previous post should be obliterated :P...do you
know if the identifier is always 4 characters? Hope so, the following
example shows how to achieve this:

string pattern = @"\*(?<Identifier>.{4})(?<Value>[^\*]+)";
string input = "*CZ1 2.3 4-56 *fuuuS24364 08 23 72";
Match match = Regex.Match(input, pattern);

while (match.Success) {
Console.WriteLine(
"Identifier: {0} - Value: {1}",
match.Groups["Identifier"].Value,
match.Groups["Value"].Value
);
match = match.NextMatch();

}HTH,
Mythran
Jan 30 '07 #6

P: n/a
Jay
I know what the identifiers are, so I'm okay with replacing the .{4}
with (Identifier1|Identifier2|...|IdentifierN) at run time. However, I
cannot blindly end the data capture on an asterisk. "*CZ1 2.3 4*A56
*fuuuS24364 08 23 72" is also valid provide *A6 is not a valid
identifier. The data capture can only end if it encounters another
valid identifier.

On Jan 30, 12:52 pm, "Jay" <JaythePC...@gmail.comwrote:
The identifier is at least 2 character, but has no upper limit.

Thanks,
Jay

On Jan 30, 12:36 pm, "Mythran" <kip_pot...@hotmail.comwrote:
"Mythran" <kip_pot...@hotmail.comwrote in messagenews:40**********************************@m icrosoft.com...
<jayluc...@gmail.comwrote in message
>news:11*********************@h3g2000cwc.googlegro ups.com...
>Howdy,
>I'm trying to break an input string into multpile pieces using a
>series of delimiters that start with an asterisk. Following the
>asterisk is a mulitple character identifier immediately followed by a
>data string of variable length. The input string may contain more than
>one identifier anywhere in the string.
>Here is an example:
>*CZ1 2.3 4-56 *fuuuS24364 08 23 72
>I'd like to break this into
>CZ
>1 2.3 4-56
>fuuu
>S24364 08 23 72
>I have tried the pattern (?:\*(CZ|fuuu)(.*)), which produces the
>following ouput:
>CZ
>1 2.3 4-56 *fuuuS24364 08 23 72
>How can I force it to repeat the capturing?
>Thanks,
>Jay
So, to split based on an * using a regular expression:
string pattern = @"\*(?<Text>[^\*]+)";
string input = "*CZ1 2.3 4-56 *fuuuS24364 08 23 72";
Match match = Regex.Match(input, pattern);
while (match.Success) {
Console.WriteLine(match.Groups["Text"].Value);
match = match.NextMatch();
}
HTH,
Mythranahh, I didn't know you wanted to break it out into identifier, text,
identifier, text...thus the previous post should be obliterated :P...do you
know if the identifier is always 4 characters? Hope so, the following
example shows how to achieve this:
string pattern = @"\*(?<Identifier>.{4})(?<Value>[^\*]+)";
string input = "*CZ1 2.3 4-56 *fuuuS24364 08 23 72";
Match match = Regex.Match(input, pattern);
while (match.Success) {
Console.WriteLine(
"Identifier: {0} - Value: {1}",
match.Groups["Identifier"].Value,
match.Groups["Value"].Value
);
match = match.NextMatch();
}HTH,
Mythran
Jan 30 '07 #7

P: n/a


"Jay" <Ja*********@gmail.comwrote in message
news:11**********************@p10g2000cwp.googlegr oups.com...
>I know what the identifiers are, so I'm okay with replacing the .{4}
with (Identifier1|Identifier2|...|IdentifierN) at run time. However, I
cannot blindly end the data capture on an asterisk. "*CZ1 2.3 4*A56
*fuuuS24364 08 23 72" is also valid provide *A6 is not a valid
identifier. The data capture can only end if it encounters another
valid identifier.

On Jan 30, 12:52 pm, "Jay" <JaythePC...@gmail.comwrote:
>The identifier is at least 2 character, but has no upper limit.

Thanks,
Jay

On Jan 30, 12:36 pm, "Mythran" <kip_pot...@hotmail.comwrote:
"Mythran" <kip_pot...@hotmail.comwrote in
messagenews:40**********************************@m icrosoft.com...
<jayluc...@gmail.comwrote in message
news:11*********************@h3g2000cwc.googlegro ups.com...
Howdy,
>I'm trying to break an input string into multpile pieces using a
series of delimiters that start with an asterisk. Following the
asterisk is a mulitple character identifier immediately followed by
a
data string of variable length. The input string may contain more
than
one identifier anywhere in the string.
>Here is an example:
*CZ1 2.3 4-56 *fuuuS24364 08 23 72
>I'd like to break this into
CZ
1 2.3 4-56
fuuu
S24364 08 23 72
>I have tried the pattern (?:\*(CZ|fuuu)(.*)), which produces the
following ouput:
CZ
1 2.3 4-56 *fuuuS24364 08 23 72
>How can I force it to repeat the capturing?
>Thanks,
Jay
So, to split based on an * using a regular expression:
string pattern = @"\*(?<Text>[^\*]+)";
string input = "*CZ1 2.3 4-56 *fuuuS24364 08 23 72";
Match match = Regex.Match(input, pattern);
while (match.Success) {
Console.WriteLine(match.Groups["Text"].Value);
match = match.NextMatch();
}
HTH,
Mythranahh, I didn't know you wanted to break it out into identifier,
text,
identifier, text...thus the previous post should be obliterated :P...do
you
know if the identifier is always 4 characters? Hope so, the following
example shows how to achieve this:
string pattern = @"\*(?<Identifier>.{4})(?<Value>[^\*]+)";
string input = "*CZ1 2.3 4-56 *fuuuS24364 08 23 72";
Match match = Regex.Match(input, pattern);
while (match.Success) {
Console.WriteLine(
"Identifier: {0} - Value: {1}",
match.Groups["Identifier"].Value,
match.Groups["Value"].Value
);
match = match.NextMatch();
}HTH,
Mythran
How many identifiers are there? If there are a small list (say, less than
10ish), then you can use the regex OR character '|' in the pattern to
separate the list of valid identifiers instead of matching on the asterisk
itself.

HTH,
Mythran

Jan 31 '07 #8

This discussion thread is closed

Replies have been disabled for this discussion.