473,396 Members | 2,102 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Regex repeating capture

Howdy,

I'm trying to break an input string into multpile pieces using a
series of delimiters that start with an asterisk. Following the
asterisk is a mulitple character identifier immediately followed by a
data string of variable length. The input string may contain more than
one identifier anywhere in the string.

Here is an example:
*CZ1 2.3 4-56 *fuuuS24364 08 23 72

I'd like to break this into
CZ
1 2.3 4-56
fuuu
S24364 08 23 72

I have tried the pattern (?:\*(CZ|fuuu)(.*)), which produces the
following ouput:
CZ
1 2.3 4-56 *fuuuS24364 08 23 72

How can I force it to repeat the capturing?

Thanks,
Jay

Jan 30 '07 #1
7 5561
Use:

public string[] Split (
params char[] separator
)
to split your string on the asterisk as a first step.

Now you can enumerate over the string array splitting out your identifiers
and data strings. You could use a StringBuilder to build what ever you want
to output.

Now you can use:

public bool StartsWith (
string value
)andpublic string Substring (
int startIndex
)e.g.StringBuilder sb = new StringBuilder();
foreach (string s in strArray)
{
if (s.StartsWith("CZ")
{
sb.Append("CZ");
sb.Append(s.Substring(2));
}
else
{
sb.Append("fuuu");
sb.Append(s.Substring(4))
}
}

return sb.ToString();

I'm sure there's an easier way using a Regex, but I can't be bothered to
puzzle it out.

HTH
Peter

<ja*******@gmail.comwrote in message
news:11*********************@h3g2000cwc.googlegrou ps.com...
Howdy,

I'm trying to break an input string into multpile pieces using a
series of delimiters that start with an asterisk. Following the
asterisk is a mulitple character identifier immediately followed by a
data string of variable length. The input string may contain more than
one identifier anywhere in the string.

Here is an example:
*CZ1 2.3 4-56 *fuuuS24364 08 23 72

I'd like to break this into
CZ
1 2.3 4-56
fuuu
S24364 08 23 72

I have tried the pattern (?:\*(CZ|fuuu)(.*)), which produces the
following ouput:
CZ
1 2.3 4-56 *fuuuS24364 08 23 72

How can I force it to repeat the capturing?

Thanks,
Jay

Jan 30 '07 #2
Sorry, this was a simple example. In all, there are 50+ identifiers
and * is allowed in that data as long it isn't immediately followed by
an identifier, otherwise it is considered another identifier.

On Jan 30, 11:58 am, "Peter Bradley" <pbrad...@uwic.ac.ukwrote:
Use:

public string[] Split (
params char[] separator
)
to split your string on the asterisk as a first step.

Now you can enumerate over the string array splitting out your identifiers
and data strings. You could use a StringBuilder to build what ever you want
to output.

Now you can use:

public bool StartsWith (
string value
)andpublic string Substring (
int startIndex
)e.g.StringBuilder sb = new StringBuilder();
foreach (string s in strArray)
{
if (s.StartsWith("CZ")
{
sb.Append("CZ");
sb.Append(s.Substring(2));
}
else
{
sb.Append("fuuu");
sb.Append(s.Substring(4))
}

}return sb.ToString();

I'm sure there's an easier way using a Regex, but I can't be bothered to
puzzle it out.

HTH

Peter

<jayluc...@gmail.comwrote in messagenews:11*********************@h3g2000cwc.goo glegroups.com...
Howdy,
I'm trying to break an input string into multpile pieces using a
series of delimiters that start with an asterisk. Following the
asterisk is a mulitple character identifier immediately followed by a
data string of variable length. The input string may contain more than
one identifier anywhere in the string.
Here is an example:
*CZ1 2.3 4-56 *fuuuS24364 08 23 72
I'd like to break this into
CZ
1 2.3 4-56
fuuu
S24364 08 23 72
I have tried the pattern (?:\*(CZ|fuuu)(.*)), which produces the
following ouput:
CZ
1 2.3 4-56 *fuuuS24364 08 23 72
How can I force it to repeat the capturing?
Thanks,
Jay
Jan 30 '07 #3


<ja*******@gmail.comwrote in message
news:11*********************@h3g2000cwc.googlegrou ps.com...
Howdy,

I'm trying to break an input string into multpile pieces using a
series of delimiters that start with an asterisk. Following the
asterisk is a mulitple character identifier immediately followed by a
data string of variable length. The input string may contain more than
one identifier anywhere in the string.

Here is an example:
*CZ1 2.3 4-56 *fuuuS24364 08 23 72

I'd like to break this into
CZ
1 2.3 4-56
fuuu
S24364 08 23 72

I have tried the pattern (?:\*(CZ|fuuu)(.*)), which produces the
following ouput:
CZ
1 2.3 4-56 *fuuuS24364 08 23 72

How can I force it to repeat the capturing?

Thanks,
Jay
So, to split based on an * using a regular expression:

string pattern = @"\*(?<Text>[^\*]+)";
string input = "*CZ1 2.3 4-56 *fuuuS24364 08 23 72";
Match match = Regex.Match(input, pattern);

while (match.Success) {
Console.WriteLine(match.Groups["Text"].Value);
match = match.NextMatch();
}

HTH,
Mythran


Jan 30 '07 #4


"Mythran" <ki********@hotmail.comwrote in message
news:40**********************************@microsof t.com...
>

<ja*******@gmail.comwrote in message
news:11*********************@h3g2000cwc.googlegrou ps.com...
>Howdy,

I'm trying to break an input string into multpile pieces using a
series of delimiters that start with an asterisk. Following the
asterisk is a mulitple character identifier immediately followed by a
data string of variable length. The input string may contain more than
one identifier anywhere in the string.

Here is an example:
*CZ1 2.3 4-56 *fuuuS24364 08 23 72

I'd like to break this into
CZ
1 2.3 4-56
fuuu
S24364 08 23 72

I have tried the pattern (?:\*(CZ|fuuu)(.*)), which produces the
following ouput:
CZ
1 2.3 4-56 *fuuuS24364 08 23 72

How can I force it to repeat the capturing?

Thanks,
Jay

So, to split based on an * using a regular expression:

string pattern = @"\*(?<Text>[^\*]+)";
string input = "*CZ1 2.3 4-56 *fuuuS24364 08 23 72";
Match match = Regex.Match(input, pattern);

while (match.Success) {
Console.WriteLine(match.Groups["Text"].Value);
match = match.NextMatch();
}

HTH,
Mythran

ahh, I didn't know you wanted to break it out into identifier, text,
identifier, text...thus the previous post should be obliterated :P...do you
know if the identifier is always 4 characters? Hope so, the following
example shows how to achieve this:

string pattern = @"\*(?<Identifier>.{4})(?<Value>[^\*]+)";
string input = "*CZ1 2.3 4-56 *fuuuS24364 08 23 72";
Match match = Regex.Match(input, pattern);

while (match.Success) {
Console.WriteLine(
"Identifier: {0} - Value: {1}",
match.Groups["Identifier"].Value,
match.Groups["Value"].Value
);
match = match.NextMatch();
}

HTH,
Mythran
Jan 30 '07 #5
Jay
The identifier is at least 2 character, but has no upper limit.
Thanks,
Jay

On Jan 30, 12:36 pm, "Mythran" <kip_pot...@hotmail.comwrote:
"Mythran" <kip_pot...@hotmail.comwrote in messagenews:40**********************************@m icrosoft.com...


<jayluc...@gmail.comwrote in message
news:11*********************@h3g2000cwc.googlegrou ps.com...
Howdy,
I'm trying to break an input string into multpile pieces using a
series of delimiters that start with an asterisk. Following the
asterisk is a mulitple character identifier immediately followed by a
data string of variable length. The input string may contain more than
one identifier anywhere in the string.
Here is an example:
*CZ1 2.3 4-56 *fuuuS24364 08 23 72
I'd like to break this into
CZ
1 2.3 4-56
fuuu
S24364 08 23 72
I have tried the pattern (?:\*(CZ|fuuu)(.*)), which produces the
following ouput:
CZ
1 2.3 4-56 *fuuuS24364 08 23 72
How can I force it to repeat the capturing?
Thanks,
Jay
So, to split based on an * using a regular expression:
string pattern = @"\*(?<Text>[^\*]+)";
string input = "*CZ1 2.3 4-56 *fuuuS24364 08 23 72";
Match match = Regex.Match(input, pattern);
while (match.Success) {
Console.WriteLine(match.Groups["Text"].Value);
match = match.NextMatch();
}
HTH,
Mythranahh, I didn't know you wanted to break it out into identifier, text,
identifier, text...thus the previous post should be obliterated :P...do you
know if the identifier is always 4 characters? Hope so, the following
example shows how to achieve this:

string pattern = @"\*(?<Identifier>.{4})(?<Value>[^\*]+)";
string input = "*CZ1 2.3 4-56 *fuuuS24364 08 23 72";
Match match = Regex.Match(input, pattern);

while (match.Success) {
Console.WriteLine(
"Identifier: {0} - Value: {1}",
match.Groups["Identifier"].Value,
match.Groups["Value"].Value
);
match = match.NextMatch();

}HTH,
Mythran
Jan 30 '07 #6
Jay
I know what the identifiers are, so I'm okay with replacing the .{4}
with (Identifier1|Identifier2|...|IdentifierN) at run time. However, I
cannot blindly end the data capture on an asterisk. "*CZ1 2.3 4*A56
*fuuuS24364 08 23 72" is also valid provide *A6 is not a valid
identifier. The data capture can only end if it encounters another
valid identifier.

On Jan 30, 12:52 pm, "Jay" <JaythePC...@gmail.comwrote:
The identifier is at least 2 character, but has no upper limit.

Thanks,
Jay

On Jan 30, 12:36 pm, "Mythran" <kip_pot...@hotmail.comwrote:
"Mythran" <kip_pot...@hotmail.comwrote in messagenews:40**********************************@m icrosoft.com...
<jayluc...@gmail.comwrote in message
>news:11*********************@h3g2000cwc.googlegro ups.com...
>Howdy,
>I'm trying to break an input string into multpile pieces using a
>series of delimiters that start with an asterisk. Following the
>asterisk is a mulitple character identifier immediately followed by a
>data string of variable length. The input string may contain more than
>one identifier anywhere in the string.
>Here is an example:
>*CZ1 2.3 4-56 *fuuuS24364 08 23 72
>I'd like to break this into
>CZ
>1 2.3 4-56
>fuuu
>S24364 08 23 72
>I have tried the pattern (?:\*(CZ|fuuu)(.*)), which produces the
>following ouput:
>CZ
>1 2.3 4-56 *fuuuS24364 08 23 72
>How can I force it to repeat the capturing?
>Thanks,
>Jay
So, to split based on an * using a regular expression:
string pattern = @"\*(?<Text>[^\*]+)";
string input = "*CZ1 2.3 4-56 *fuuuS24364 08 23 72";
Match match = Regex.Match(input, pattern);
while (match.Success) {
Console.WriteLine(match.Groups["Text"].Value);
match = match.NextMatch();
}
HTH,
Mythranahh, I didn't know you wanted to break it out into identifier, text,
identifier, text...thus the previous post should be obliterated :P...do you
know if the identifier is always 4 characters? Hope so, the following
example shows how to achieve this:
string pattern = @"\*(?<Identifier>.{4})(?<Value>[^\*]+)";
string input = "*CZ1 2.3 4-56 *fuuuS24364 08 23 72";
Match match = Regex.Match(input, pattern);
while (match.Success) {
Console.WriteLine(
"Identifier: {0} - Value: {1}",
match.Groups["Identifier"].Value,
match.Groups["Value"].Value
);
match = match.NextMatch();
}HTH,
Mythran
Jan 30 '07 #7


"Jay" <Ja*********@gmail.comwrote in message
news:11**********************@p10g2000cwp.googlegr oups.com...
>I know what the identifiers are, so I'm okay with replacing the .{4}
with (Identifier1|Identifier2|...|IdentifierN) at run time. However, I
cannot blindly end the data capture on an asterisk. "*CZ1 2.3 4*A56
*fuuuS24364 08 23 72" is also valid provide *A6 is not a valid
identifier. The data capture can only end if it encounters another
valid identifier.

On Jan 30, 12:52 pm, "Jay" <JaythePC...@gmail.comwrote:
>The identifier is at least 2 character, but has no upper limit.

Thanks,
Jay

On Jan 30, 12:36 pm, "Mythran" <kip_pot...@hotmail.comwrote:
"Mythran" <kip_pot...@hotmail.comwrote in
messagenews:40**********************************@m icrosoft.com...
<jayluc...@gmail.comwrote in message
news:11*********************@h3g2000cwc.googlegro ups.com...
Howdy,
>I'm trying to break an input string into multpile pieces using a
series of delimiters that start with an asterisk. Following the
asterisk is a mulitple character identifier immediately followed by
a
data string of variable length. The input string may contain more
than
one identifier anywhere in the string.
>Here is an example:
*CZ1 2.3 4-56 *fuuuS24364 08 23 72
>I'd like to break this into
CZ
1 2.3 4-56
fuuu
S24364 08 23 72
>I have tried the pattern (?:\*(CZ|fuuu)(.*)), which produces the
following ouput:
CZ
1 2.3 4-56 *fuuuS24364 08 23 72
>How can I force it to repeat the capturing?
>Thanks,
Jay
So, to split based on an * using a regular expression:
string pattern = @"\*(?<Text>[^\*]+)";
string input = "*CZ1 2.3 4-56 *fuuuS24364 08 23 72";
Match match = Regex.Match(input, pattern);
while (match.Success) {
Console.WriteLine(match.Groups["Text"].Value);
match = match.NextMatch();
}
HTH,
Mythranahh, I didn't know you wanted to break it out into identifier,
text,
identifier, text...thus the previous post should be obliterated :P...do
you
know if the identifier is always 4 characters? Hope so, the following
example shows how to achieve this:
string pattern = @"\*(?<Identifier>.{4})(?<Value>[^\*]+)";
string input = "*CZ1 2.3 4-56 *fuuuS24364 08 23 72";
Match match = Regex.Match(input, pattern);
while (match.Success) {
Console.WriteLine(
"Identifier: {0} - Value: {1}",
match.Groups["Identifier"].Value,
match.Groups["Value"].Value
);
match = match.NextMatch();
}HTH,
Mythran
How many identifiers are there? If there are a small list (say, less than
10ish), then you can use the regex OR character '|' in the pattern to
separate the list of valid identifiers instead of matching on the asterisk
itself.

HTH,
Mythran

Jan 31 '07 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Adam Flott | last post by:
I'm having some difficulty getting the expect function of telnetlib to capture some data that gets returned from a telnet connection. Python's telnet debug reports this: recv...
4
by: Masahiro Ito | last post by:
I have attached a block of text similar to the type that I am working with. I have been learning a lot about Regex - it is quite impressive. I can easily capture bits of info, but I keep having...
5
by: Bill Cohagan | last post by:
I'm looking for help with a regular expression question, so my first question is which newsgroup is the best one to post to? Just in case *this* is the best choice, here's the problem: I'm...
2
by: Jose | last post by:
There's something for me to learn with this example, i'm sure :) Given this text: "....." and my first attempt at capture the groups: "(?:\)" RegExTest gives me what i expect: 6 captured...
1
by: msnews.microsoft.com | last post by:
Hi, I have the expression "<font+>""(*)""</font>+\?AUTHOR_ID=+"">(*)</a>" Any body can tell me what is the meaning of that expression and what is the output of the expression. Regards, Muhammad...
3
by: Masa Ito | last post by:
I am trying to capture the contents of a function with Regex. I am using Expresso to test (nice - thanks for the great tool UltraPico!). I can handle my own with single line regex's (I think).. ...
3
by: Ethan Strauss | last post by:
Hi, I have written a regular expression which is supposed to pull a direction (forward or reverse) designation from a file name. Unfortunately, the direction designation can either be the...
1
by: =?Utf-8?B?QWxCcnVBbg==?= | last post by:
I have a regular expression for capturing all occurrences of words contained between {{ and }} in a file. My problem is I need to capture what is between those symbols. For instance, if I have...
2
by: Good Man | last post by:
Hi there I have a series of HTML tables (well-formed, with elements ID'd quite nicely) and I need to extract the contents from certain TDs. For example, I'd like to get "Hi Mom!" from the...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.