473,385 Members | 2,013 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Regex puzzle

Can anyone help me figure out a regex pattern for the following input
example:

xxx:a=b,c=d,yyy:e=f,zzz:www:g=h,i=j,l=m

I would want four matches from this:
1. xxx a=b,c=d
2. yyy e=f
3. zzz (empty)
4. www g=h,i=j,l=m

None of the letters here are single letters, but rather placeholders for
arbitrary words. For example,

LTG:LTG=2-41-53-57,JOB:JN=113&&116&125&&127,CPT:CODE=09789,TRATYP= AMBINC-7-A
MBINC/CPTGRP-0-CPTGRP

Would result in:
1. LTG LTG=2-41-53-57
2. JOB JN=113&&116&125&&127
3. CPT CODE=09789,TRATYP=AMBINC-7-AMBINC/CPTGRP-0-CPTGRP

Everything I've come up with so far would require me to iterate over
substrings. It'd be nice to have just a single matching operation. TIA.

-- Alan
Jul 19 '05 #1
3 2355
How about?
(\w+):([^:]+)?,(\w+):([^:]+)?,(\w+):([^:]+)?

Go to http://www.organicbit.com/regex/fog0000000019.html and get the regex
tool, it's handy for building these things.

The tool helps when you are coding the regex, but it is cumbersome when you
want to verify the correctness of the regex and match, across a large set of
input. For this you would be better off with a unit test app, where you
store an array of (input,output) pairs. Then run the regex on each input
and compare it to the expected output. (Example below)

-Dino
//
// emailValidation.cs
//
// uses a regexp to validate emails.
// This test program uses xml serialization to get the test input,
// including the regexp string and the various emails to test.
//
// references:
// http://homepage.stts.edu/~agushen/sc...alidation.html
//
// Fri, 15 Aug 2003 11:28
//

using Ionic.Test.EmailValidation;

namespace Ionic.Test.EmailValidation {

/// <remarks>
/// Represents all the input for the test, including the regex to test,
/// and an array of test cases.
/// </remarks>
[System.Xml.Serialization.XmlRootAttribute("Email.V alidation.Input",
Namespace="", IsNullable=false)]
public class TestInput {

/// <remarks/>

[System.Xml.Serialization.XmlElementAttribute(Form= System.Xml.Schema.XmlSche
maForm.Unqualified)]
public string Regexp;

/// <remarks/>

[System.Xml.Serialization.XmlArrayAttribute(Form=Sy stem.Xml.Schema.XmlSchema
Form.Unqualified)]
[System.Xml.Serialization.XmlArrayItemAttribute("Ca se",
Form=System.Xml.Schema.XmlSchemaForm.Unqualified, IsNullable=false)]
public TestCase[] TestList;
}
/// <remarks>
/// This is the type that stores a single test case.
/// We need a bunch of these to verify that the regex works as
/// expected. Each test case has an input and an output. In our
/// case, the input is a string, and the output is a bool value,
/// which indicates whether the Regex should match or not.
/// Other tests will have different input and output.
/// </remarks>
public class TestCase {

/// <remarks/>

[System.Xml.Serialization.XmlElementAttribute(Form= System.Xml.Schema.XmlSche
maForm.Unqualified)]
public string Input;

/// <remarks/>

[System.Xml.Serialization.XmlElementAttribute(Form= System.Xml.Schema.XmlSche
maForm.Unqualified)]
public bool ExpectedOutput;
}
/// <remarks>
/// This is the test app. The main routine de-serializes from
/// an XML file, then runs the tests, comparing the expected
/// (or desired) output with the actual result.
/// </remarks>
public class Tester {

public static void Main() {
string InputPath= "EmailValidationInput.xml";

System.IO.FileStream fs = new System.IO.FileStream(InputPath,
System.IO.FileMode.Open);
System.Xml.Serialization.XmlSerializer s= new
System.Xml.Serialization.XmlSerializer(typeof(Test Input));
TestInput Input= (TestInput) s.Deserialize(fs);
fs.Close();

System.Text.RegularExpressions.Regex regex= new
System.Text.RegularExpressions.Regex (Input.Regexp);

foreach (TestCase tc in Input.TestList) {
System.Console.WriteLine(tc.Input +"\n " + tc.ExpectedOutput + " \\ " +
regex.IsMatch(tc.Input));
}
}
}
}
This is input data. Store this in the XML file that is de-serialized for
this test.

<Email.Validation.Input>
<TestList>
<!--
================================================== ================ -->
<!-- =================== True test cases
============================== -->
<!--
================================================== ================ -->

<Case>
<Input>Ro***@rabbit.com</Input>
<ExpectedOutput>true</ExpectedOutput>
</Case>
<Case>
<Input>th*********************************@somethi ng.org</Input>
<ExpectedOutput>true</ExpectedOutput>
</Case>
<Case>
<Input>th*******@something.9g</Input>
<ExpectedOutput>true</ExpectedOutput>
</Case>
<Case>
<Input>th*******@place.org</Input>
<ExpectedOutput>true</ExpectedOutput>
</Case>
<Case>
<Input>We***********@cornell.edu</Input>
<ExpectedOutput>true</ExpectedOutput>
</Case>
<Case>
<Input>Ja***********@sun-east.com</Input>
<ExpectedOutput>true</ExpectedOutput>
</Case>
<Case>
<Input>Ja***********@sun.east.com</Input>
<ExpectedOutput>true</ExpectedOutput>
</Case>
<Case>
<Input>Ja***********@sun.com</Input>
<ExpectedOutput>true</ExpectedOutput>
</Case>
<Case>
<Input>Pr*******@rolling-hills.club.org</Input>
<ExpectedOutput>true</ExpectedOutput>
</Case>
<Case>
<Input>9L****@club.org</Input>
<ExpectedOutput>true</ExpectedOutput>
</Case>
<Case>
<Input>fr**@somewhere.org9</Input>
<ExpectedOutput>true</ExpectedOutput>
</Case>
<Case>
<Input>f@z.k</Input>
<ExpectedOutput>true</ExpectedOutput>
</Case>
<Case>
<Input>_e***@sesame.org</Input>
<ExpectedOutput>true</ExpectedOutput>
</Case>
<Case>
<Input>Ha**********@Hogwarts.edu</Input>
<ExpectedOutput>true</ExpectedOutput>
</Case>
<Case>
<Input>Pr************************@Faculty.Hogwarts .edu</Input>
<ExpectedOutput>true</ExpectedOutput>
</Case>

<!--
================================================== ================ -->
<!-- =================== False test cases
============================= -->
<!--
================================================== ================ -->

<Case>
<Input>-e***@sesame.org</Input>
<ExpectedOutput>false</ExpectedOutput>
</Case>
<Case>
<Input>el**@sesame.org.</Input>
<ExpectedOutput>false</ExpectedOutput>
</Case>
<Case>
<Input>-e***@sesame.org.</Input>
<ExpectedOutput>false</ExpectedOutput>
</Case>
<Case>
<Input>elmo@.org.</Input>
<ExpectedOutput>false</ExpectedOutput>
</Case>
<Case>
<Input>elmo@.org</Input>
<ExpectedOutput>false</ExpectedOutput>
</Case>
<Case>
<Input>elmo@.someplace.org</Input>
<ExpectedOutput>false</ExpectedOutput>
</Case>
<Case>
<Input>elmo@cloud9</Input>
<ExpectedOutput>false</ExpectedOutput>
</Case>
<Case>
<Input>fred.@somewhere.org9</Input>
<ExpectedOutput>false</ExpectedOutput>
</Case>
<Case>
<Input>fred@somewhere..org9</Input>
<ExpectedOutput>false</ExpectedOutput>
</Case>
<Case>
<Input>9Lives.club.org</Input>
<ExpectedOutput>false</ExpectedOutput>
</Case>
<Case>
<Input>@club.org</Input>
<ExpectedOutput>false</ExpectedOutput>
</Case>
<Case>
<Input>.so*****@club.org</Input>
<ExpectedOutput>false</ExpectedOutput>
</Case>

</TestList>
<Regexp>^(\w([\.\-\w]*\w)?)@(\w([\.\-\w]*\w)*\.\w([\.\-\w]*\w)?)$</Regexp>
</Email.Validation.Input>

"Alan Pretre" <no@spam> wrote in message
news:ep**************@TK2MSFTNGP09.phx.gbl...
Can anyone help me figure out a regex pattern for the following input
example:

xxx:a=b,c=d,yyy:e=f,zzz:www:g=h,i=j,l=m

I would want four matches from this:
1. xxx a=b,c=d
2. yyy e=f
3. zzz (empty)
4. www g=h,i=j,l=m

None of the letters here are single letters, but rather placeholders for
arbitrary words. For example,

LTG:LTG=2-41-53-57,JOB:JN=113&&116&125&&127,CPT:CODE=09789,TRATYP= AMBINC-7-A MBINC/CPTGRP-0-CPTGRP

Would result in:
1. LTG LTG=2-41-53-57
2. JOB JN=113&&116&125&&127
3. CPT CODE=09789,TRATYP=AMBINC-7-AMBINC/CPTGRP-0-CPTGRP

Everything I've come up with so far would require me to iterate over
substrings. It'd be nice to have just a single matching operation. TIA.

-- Alan

Jul 19 '05 #2
"Dino Chiesa [MSFT]" <di****@microsoft.com> wrote in message
news:uU**************@tk2msftngp13.phx.gbl...
How about?
(\w+):([^:]+)?,(\w+):([^:]+)?,(\w+):([^:]+)?


Dino,

Your regex fails (no match) with a simple test, CMD:PARM=X, and I didn't
have much luck with others I tried. For example, my OP had this example,

LTG:LTG=2-41-53-57,JOB:JN=113&&116&125&&127,CPT:CODE=09789,TRATYP= AMBINC-7-A
MBINC/CPTGRP-0-CPTGRP

Your regex gives this result:
1 matches.
Match 1 has 7 groups.
Group 1 =
"LTG:LTG=2-41-53-57,JOB:JN=113&&116&125&&127,CPT:CODE=09789,TRATYP= AMBINC-7-
AMBINC/CPTGRP-0-CPTGRP"
Group 2 = "LTG"
Group 3 = "LTG=2-41-53-57"
Group 4 = "JOB"
Group 5 = "JN=113&&116&125&&127"
Group 6 = "CPT"
Group 7 = "CODE=09789,TRATYP=AMBINC-7-AMBINC/CPTGRP-0-CPTGRP"

But I was looking for something more along the lines of (Group 2 & 3 in each
match are the desired values):
3 matches.
Match 1 has 3 groups.
Group 1 = "LTG:LTG=2-41-53-57"
Group 2 = "LTG"
Group 3 = "LTG=2-41-53-57"
Match 2 has 3 groups.
Group 1 = "JOB:JN=113&&116&125&&127"
Group 2 = "JOB"
Group 3 = "JN=113&&116&125&&127"
Match 3 has 3 groups.
Group 1 = "CPT:CODE=09789,TRATYP=AMBINC-7-AMBINC/CPTGRP-0-CPTGRP"
Group 2 = "CPT"
Group 3 = "CODE=09789,TRATYP=AMBINC-7-AMBINC/CPTGRP-0-CPTGRP"

But thanks for your advice. I will study what you supplied to try to
understand it as well. Thanks!

-- Alan
Jul 19 '05 #3
Try the following:

Regex regex = new Regex(@"
( # overall repetition
(?<Item> # Capture to item
(?<Tag>.*?) # Any character, one or more times, non-greedy
: # literal :
.*? # any character, one or more times, non-greedy
) # end of capture
,? # optional "","". This eats the comma between the Items
(?= # optional zero-width lookahead. This must match at this
spot
(\w+: # one or more word characters, followed by a literal :
| # or
$ # end of line
)
)
)+ # one or more times",
RegexOptions.ExplicitCapture |
RegexOptions.Compiled |
RegexOptions.Singleline |
RegexOptions.IgnorePatternWhitespace);

The key to this is the zero-width lookahead. It ensures that the part after
the match is either <xxx>:, or the end of the string, without eating any of
the characters. As you've probably found, without this there's no way to
know whether you should include a comma or break on it.

Here's the output I get from my regex workbench:

Matching:
LTG:LTG=2-41-53-57,JOB:JN=113&&116&125&&127,CPT:CODE=09789,TRATYP= AMBINC-7-A
MBINC/CPTGRP-0-CPTGRP
Item => LTG:LTG=2-41-53-57
Item => JOB:JN=113&&116&125&&127
Item => CPT:CODE=09789,TRATYP=AMBINC-7-AMBINC/CPTGRP-0-CPTGRP
Tag => LTG
Tag => JOB
Tag => CPT

--
Eric Gunnerson

Visit the C# product team at http://www.csharp.net
Eric's blog is at http://blogs.gotdotnet.com/ericgu/

This posting is provided "AS IS" with no warranties, and confers no rights.
"Alan Pretre" <no@spam> wrote in message
news:ep**************@TK2MSFTNGP09.phx.gbl...
Can anyone help me figure out a regex pattern for the following input
example:

xxx:a=b,c=d,yyy:e=f,zzz:www:g=h,i=j,l=m

I would want four matches from this:
1. xxx a=b,c=d
2. yyy e=f
3. zzz (empty)
4. www g=h,i=j,l=m

None of the letters here are single letters, but rather placeholders for
arbitrary words. For example,

LTG:LTG=2-41-53-57,JOB:JN=113&&116&125&&127,CPT:CODE=09789,TRATYP= AMBINC-7-A MBINC/CPTGRP-0-CPTGRP

Would result in:
1. LTG LTG=2-41-53-57
2. JOB JN=113&&116&125&&127
3. CPT CODE=09789,TRATYP=AMBINC-7-AMBINC/CPTGRP-0-CPTGRP

Everything I've come up with so far would require me to iterate over
substrings. It'd be nice to have just a single matching operation. TIA.

-- Alan

Jul 19 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: G. Stewart | last post by:
The objective is to extract the first n characters of text from an HTML block. I wish to preserve all HTML (links, formatting etc.), and at the same time, extend the size of the block to ensure...
6
by: Alan Pretre | last post by:
Can anyone help me figure out a regex pattern for the following input example: xxx:a=b,c=d,yyy:e=f,zzz:www:g=h,i=j,l=m I would want four matches from this: 1. xxx a=b,c=d 2. yyy e=f 3....
1
by: xavier vazquez | last post by:
I have a problem with a program that does not working properly...when the program run is suppose to generate a cross word puzzle , when the outcome show the letter of the words overlap one intop of...
1
by: Tom | last post by:
A puzzle for you regular expression wizards out there. Looking for a regex that will split up a string like the following on any pipe (|) not inside brackets: a b | a { b |{c | cd}} d | a b c...
4
by: honey777 | last post by:
Problem: 15 Puzzle This is a common puzzle with a 4x4 playing space with 15 tiles, numbered 1 through 15. One "spot" is always left blank. Here is an example of the puzzle: The goal is to...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.