By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,946 Members | 697 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,946 IT Pros & Developers. It's quick & easy.

Regex puzzle

P: n/a
Can anyone help me figure out a regex pattern for the following input
example:

xxx:a=b,c=d,yyy:e=f,zzz:www:g=h,i=j,l=m

I would want four matches from this:
1. xxx a=b,c=d
2. yyy e=f
3. zzz (empty)
4. www g=h,i=j,l=m

None of the letters here are single letters, but rather placeholders for
arbitrary words. For example,

LTG:LTG=2-41-53-57,JOB:JN=113&&116&125&&127,CPT:CODE=09789,TRATYP= AMBINC-7-A
MBINC/CPTGRP-0-CPTGRP

Would result in:
1. LTG LTG=2-41-53-57
2. JOB JN=113&&116&125&&127
3. CPT CODE=09789,TRATYP=AMBINC-7-AMBINC/CPTGRP-0-CPTGRP

Everything I've come up with so far would require me to iterate over
substrings. It'd be nice to have just a single matching operation. TIA.

-- Alan
Jul 19 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
How about?
(\w+):([^:]+)?,(\w+):([^:]+)?,(\w+):([^:]+)?

Go to http://www.organicbit.com/regex/fog0000000019.html and get the regex
tool, it's handy for building these things.

The tool helps when you are coding the regex, but it is cumbersome when you
want to verify the correctness of the regex and match, across a large set of
input. For this you would be better off with a unit test app, where you
store an array of (input,output) pairs. Then run the regex on each input
and compare it to the expected output. (Example below)

-Dino
//
// emailValidation.cs
//
// uses a regexp to validate emails.
// This test program uses xml serialization to get the test input,
// including the regexp string and the various emails to test.
//
// references:
// http://homepage.stts.edu/~agushen/sc...alidation.html
//
// Fri, 15 Aug 2003 11:28
//

using Ionic.Test.EmailValidation;

namespace Ionic.Test.EmailValidation {

/// <remarks>
/// Represents all the input for the test, including the regex to test,
/// and an array of test cases.
/// </remarks>
[System.Xml.Serialization.XmlRootAttribute("Email.V alidation.Input",
Namespace="", IsNullable=false)]
public class TestInput {

/// <remarks/>

[System.Xml.Serialization.XmlElementAttribute(Form= System.Xml.Schema.XmlSche
maForm.Unqualified)]
public string Regexp;

/// <remarks/>

[System.Xml.Serialization.XmlArrayAttribute(Form=Sy stem.Xml.Schema.XmlSchema
Form.Unqualified)]
[System.Xml.Serialization.XmlArrayItemAttribute("Ca se",
Form=System.Xml.Schema.XmlSchemaForm.Unqualified, IsNullable=false)]
public TestCase[] TestList;
}
/// <remarks>
/// This is the type that stores a single test case.
/// We need a bunch of these to verify that the regex works as
/// expected. Each test case has an input and an output. In our
/// case, the input is a string, and the output is a bool value,
/// which indicates whether the Regex should match or not.
/// Other tests will have different input and output.
/// </remarks>
public class TestCase {

/// <remarks/>

[System.Xml.Serialization.XmlElementAttribute(Form= System.Xml.Schema.XmlSche
maForm.Unqualified)]
public string Input;

/// <remarks/>

[System.Xml.Serialization.XmlElementAttribute(Form= System.Xml.Schema.XmlSche
maForm.Unqualified)]
public bool ExpectedOutput;
}
/// <remarks>
/// This is the test app. The main routine de-serializes from
/// an XML file, then runs the tests, comparing the expected
/// (or desired) output with the actual result.
/// </remarks>
public class Tester {

public static void Main() {
string InputPath= "EmailValidationInput.xml";

System.IO.FileStream fs = new System.IO.FileStream(InputPath,
System.IO.FileMode.Open);
System.Xml.Serialization.XmlSerializer s= new
System.Xml.Serialization.XmlSerializer(typeof(Test Input));
TestInput Input= (TestInput) s.Deserialize(fs);
fs.Close();

System.Text.RegularExpressions.Regex regex= new
System.Text.RegularExpressions.Regex (Input.Regexp);

foreach (TestCase tc in Input.TestList) {
System.Console.WriteLine(tc.Input +"\n " + tc.ExpectedOutput + " \\ " +
regex.IsMatch(tc.Input));
}
}
}
}
This is input data. Store this in the XML file that is de-serialized for
this test.

<Email.Validation.Input>
<TestList>
<!--
================================================== ================ -->
<!-- =================== True test cases
============================== -->
<!--
================================================== ================ -->

<Case>
<Input>Ro***@rabbit.com</Input>
<ExpectedOutput>true</ExpectedOutput>
</Case>
<Case>
<Input>th*********************************@somethi ng.org</Input>
<ExpectedOutput>true</ExpectedOutput>
</Case>
<Case>
<Input>th*******@something.9g</Input>
<ExpectedOutput>true</ExpectedOutput>
</Case>
<Case>
<Input>th*******@place.org</Input>
<ExpectedOutput>true</ExpectedOutput>
</Case>
<Case>
<Input>We***********@cornell.edu</Input>
<ExpectedOutput>true</ExpectedOutput>
</Case>
<Case>
<Input>Ja***********@sun-east.com</Input>
<ExpectedOutput>true</ExpectedOutput>
</Case>
<Case>
<Input>Ja***********@sun.east.com</Input>
<ExpectedOutput>true</ExpectedOutput>
</Case>
<Case>
<Input>Ja***********@sun.com</Input>
<ExpectedOutput>true</ExpectedOutput>
</Case>
<Case>
<Input>Pr*******@rolling-hills.club.org</Input>
<ExpectedOutput>true</ExpectedOutput>
</Case>
<Case>
<Input>9L****@club.org</Input>
<ExpectedOutput>true</ExpectedOutput>
</Case>
<Case>
<Input>fr**@somewhere.org9</Input>
<ExpectedOutput>true</ExpectedOutput>
</Case>
<Case>
<Input>f@z.k</Input>
<ExpectedOutput>true</ExpectedOutput>
</Case>
<Case>
<Input>_e***@sesame.org</Input>
<ExpectedOutput>true</ExpectedOutput>
</Case>
<Case>
<Input>Ha**********@Hogwarts.edu</Input>
<ExpectedOutput>true</ExpectedOutput>
</Case>
<Case>
<Input>Pr************************@Faculty.Hogwarts .edu</Input>
<ExpectedOutput>true</ExpectedOutput>
</Case>

<!--
================================================== ================ -->
<!-- =================== False test cases
============================= -->
<!--
================================================== ================ -->

<Case>
<Input>-e***@sesame.org</Input>
<ExpectedOutput>false</ExpectedOutput>
</Case>
<Case>
<Input>el**@sesame.org.</Input>
<ExpectedOutput>false</ExpectedOutput>
</Case>
<Case>
<Input>-e***@sesame.org.</Input>
<ExpectedOutput>false</ExpectedOutput>
</Case>
<Case>
<Input>elmo@.org.</Input>
<ExpectedOutput>false</ExpectedOutput>
</Case>
<Case>
<Input>elmo@.org</Input>
<ExpectedOutput>false</ExpectedOutput>
</Case>
<Case>
<Input>elmo@.someplace.org</Input>
<ExpectedOutput>false</ExpectedOutput>
</Case>
<Case>
<Input>elmo@cloud9</Input>
<ExpectedOutput>false</ExpectedOutput>
</Case>
<Case>
<Input>fred.@somewhere.org9</Input>
<ExpectedOutput>false</ExpectedOutput>
</Case>
<Case>
<Input>fred@somewhere..org9</Input>
<ExpectedOutput>false</ExpectedOutput>
</Case>
<Case>
<Input>9Lives.club.org</Input>
<ExpectedOutput>false</ExpectedOutput>
</Case>
<Case>
<Input>@club.org</Input>
<ExpectedOutput>false</ExpectedOutput>
</Case>
<Case>
<Input>.so*****@club.org</Input>
<ExpectedOutput>false</ExpectedOutput>
</Case>

</TestList>
<Regexp>^(\w([\.\-\w]*\w)?)@(\w([\.\-\w]*\w)*\.\w([\.\-\w]*\w)?)$</Regexp>
</Email.Validation.Input>

"Alan Pretre" <no@spam> wrote in message
news:ep**************@TK2MSFTNGP09.phx.gbl...
Can anyone help me figure out a regex pattern for the following input
example:

xxx:a=b,c=d,yyy:e=f,zzz:www:g=h,i=j,l=m

I would want four matches from this:
1. xxx a=b,c=d
2. yyy e=f
3. zzz (empty)
4. www g=h,i=j,l=m

None of the letters here are single letters, but rather placeholders for
arbitrary words. For example,

LTG:LTG=2-41-53-57,JOB:JN=113&&116&125&&127,CPT:CODE=09789,TRATYP= AMBINC-7-A MBINC/CPTGRP-0-CPTGRP

Would result in:
1. LTG LTG=2-41-53-57
2. JOB JN=113&&116&125&&127
3. CPT CODE=09789,TRATYP=AMBINC-7-AMBINC/CPTGRP-0-CPTGRP

Everything I've come up with so far would require me to iterate over
substrings. It'd be nice to have just a single matching operation. TIA.

-- Alan

Jul 19 '05 #2

P: n/a
"Dino Chiesa [MSFT]" <di****@microsoft.com> wrote in message
news:uU**************@tk2msftngp13.phx.gbl...
How about?
(\w+):([^:]+)?,(\w+):([^:]+)?,(\w+):([^:]+)?


Dino,

Your regex fails (no match) with a simple test, CMD:PARM=X, and I didn't
have much luck with others I tried. For example, my OP had this example,

LTG:LTG=2-41-53-57,JOB:JN=113&&116&125&&127,CPT:CODE=09789,TRATYP= AMBINC-7-A
MBINC/CPTGRP-0-CPTGRP

Your regex gives this result:
1 matches.
Match 1 has 7 groups.
Group 1 =
"LTG:LTG=2-41-53-57,JOB:JN=113&&116&125&&127,CPT:CODE=09789,TRATYP= AMBINC-7-
AMBINC/CPTGRP-0-CPTGRP"
Group 2 = "LTG"
Group 3 = "LTG=2-41-53-57"
Group 4 = "JOB"
Group 5 = "JN=113&&116&125&&127"
Group 6 = "CPT"
Group 7 = "CODE=09789,TRATYP=AMBINC-7-AMBINC/CPTGRP-0-CPTGRP"

But I was looking for something more along the lines of (Group 2 & 3 in each
match are the desired values):
3 matches.
Match 1 has 3 groups.
Group 1 = "LTG:LTG=2-41-53-57"
Group 2 = "LTG"
Group 3 = "LTG=2-41-53-57"
Match 2 has 3 groups.
Group 1 = "JOB:JN=113&&116&125&&127"
Group 2 = "JOB"
Group 3 = "JN=113&&116&125&&127"
Match 3 has 3 groups.
Group 1 = "CPT:CODE=09789,TRATYP=AMBINC-7-AMBINC/CPTGRP-0-CPTGRP"
Group 2 = "CPT"
Group 3 = "CODE=09789,TRATYP=AMBINC-7-AMBINC/CPTGRP-0-CPTGRP"

But thanks for your advice. I will study what you supplied to try to
understand it as well. Thanks!

-- Alan
Jul 19 '05 #3

P: n/a
Try the following:

Regex regex = new Regex(@"
( # overall repetition
(?<Item> # Capture to item
(?<Tag>.*?) # Any character, one or more times, non-greedy
: # literal :
.*? # any character, one or more times, non-greedy
) # end of capture
,? # optional "","". This eats the comma between the Items
(?= # optional zero-width lookahead. This must match at this
spot
(\w+: # one or more word characters, followed by a literal :
| # or
$ # end of line
)
)
)+ # one or more times",
RegexOptions.ExplicitCapture |
RegexOptions.Compiled |
RegexOptions.Singleline |
RegexOptions.IgnorePatternWhitespace);

The key to this is the zero-width lookahead. It ensures that the part after
the match is either <xxx>:, or the end of the string, without eating any of
the characters. As you've probably found, without this there's no way to
know whether you should include a comma or break on it.

Here's the output I get from my regex workbench:

Matching:
LTG:LTG=2-41-53-57,JOB:JN=113&&116&125&&127,CPT:CODE=09789,TRATYP= AMBINC-7-A
MBINC/CPTGRP-0-CPTGRP
Item => LTG:LTG=2-41-53-57
Item => JOB:JN=113&&116&125&&127
Item => CPT:CODE=09789,TRATYP=AMBINC-7-AMBINC/CPTGRP-0-CPTGRP
Tag => LTG
Tag => JOB
Tag => CPT

--
Eric Gunnerson

Visit the C# product team at http://www.csharp.net
Eric's blog is at http://blogs.gotdotnet.com/ericgu/

This posting is provided "AS IS" with no warranties, and confers no rights.
"Alan Pretre" <no@spam> wrote in message
news:ep**************@TK2MSFTNGP09.phx.gbl...
Can anyone help me figure out a regex pattern for the following input
example:

xxx:a=b,c=d,yyy:e=f,zzz:www:g=h,i=j,l=m

I would want four matches from this:
1. xxx a=b,c=d
2. yyy e=f
3. zzz (empty)
4. www g=h,i=j,l=m

None of the letters here are single letters, but rather placeholders for
arbitrary words. For example,

LTG:LTG=2-41-53-57,JOB:JN=113&&116&125&&127,CPT:CODE=09789,TRATYP= AMBINC-7-A MBINC/CPTGRP-0-CPTGRP

Would result in:
1. LTG LTG=2-41-53-57
2. JOB JN=113&&116&125&&127
3. CPT CODE=09789,TRATYP=AMBINC-7-AMBINC/CPTGRP-0-CPTGRP

Everything I've come up with so far would require me to iterate over
substrings. It'd be nice to have just a single matching operation. TIA.

-- Alan

Jul 19 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.