473,569 Members | 2,698 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Regex puzzle

Can anyone help me figure out a regex pattern for the following input
example:

xxx:a=b,c=d,yyy :e=f,zzz:www:g= h,i=j,l=m

I would want four matches from this:
1. xxx a=b,c=d
2. yyy e=f
3. zzz (empty)
4. www g=h,i=j,l=m

None of the letters here are single letters, but rather placeholders for
arbitrary words. For example,

LTG:LTG=2-41-53-57,JOB:JN=113&& 116&125&&127,CP T:CODE=09789,TR ATYP=AMBINC-7-A
MBINC/CPTGRP-0-CPTGRP

Would result in:
1. LTG LTG=2-41-53-57
2. JOB JN=113&&116&125 &&127
3. CPT CODE=09789,TRAT YP=AMBINC-7-AMBINC/CPTGRP-0-CPTGRP

Everything I've come up with so far would require me to iterate over
substrings. It'd be nice to have just a single matching operation. TIA.

-- Alan
Nov 15 '05 #1
6 2399

Give this one a try:

(?n)((?<item>[A-Za-z]+):(?<value>[A-Za-z]+=.*?)?(?=((,[A-
Za-z]+:)|$)))+

For your input, it gives 3 matches, each with "item"
and "value" groups for what comes before and after the
colon.
Brian Davis
www.knowdotnet.com

-----Original Message-----
Can anyone help me figure out a regex pattern for the following inputexample:

xxx:a=b,c=d,yy y:e=f,zzz:www:g =h,i=j,l=m

I would want four matches from this:
1. xxx a=b,c=d
2. yyy e=f
3. zzz (empty)
4. www g=h,i=j,l=m

None of the letters here are single letters, but rather placeholders forarbitrary words. For example,

LTG:LTG=2-41-53- 57,JOB:JN=113&& 116&125&&127,CP T:CODE=09789,TR ATYP=AMBINC-7-
AMBINC/CPTGRP-0-CPTGRP

Would result in:
1. LTG LTG=2-41-53-57
2. JOB JN=113&&116&125 &&127
3. CPT CODE=09789,TRAT YP=AMBINC-7-AMBINC/CPTGRP-0- CPTGRP
Everything I've come up with so far would require me to iterate oversubstrings. It'd be nice to have just a single matching operation. TIA.
-- Alan
.

Nov 15 '05 #2
"Brian Davis" <br***@knowdotn et.com> wrote in message
news:02******** *************** *****@phx.gbl.. .

Give this one a try:

(?n)((?<item>[A-Za-z]+):(?<value>[A-Za-z]+=.*?)?(?=((,[A-
Za-z]+:)|$)))+

For your input, it gives 3 matches, each with "item"
and "value" groups for what comes before and after the
colon.


Brian, *very* impressive. It works beautifully. I changed the last term to
(?=(,[A-Za-z]+:)|$))+
since it looked like there were extraneous parentheses. You gave me much to
study. Thanks again.

-- Alan
Nov 15 '05 #3
something like this:
static void Main(string[] args)
{
string constant = "LTG:LTG=2-41-53-57,JOB:JN=113&& 116&125&&127,CP T:CODE=09789,TR ATYP=AMBINC-7-AMBINC/CPTGRP-0-CPTGRP";
Regex reg = new Regex(@"(?'code '\w\w\w):(?'val ue'[a-zA-Z0-9\-&/]+=[a-zA-Z0-9\-&/]+,?)*");
MatchCollection coll = reg.Matches(con stant);
int i = 0;
foreach(Match match in coll)
{
Console.WriteLi ne(i++ + ". " + match.Groups["code"] + " -- " + match.Value);
}
}
Can anyone help me figure out a regex pattern for the following input
example:

xxx:a=b,c=d,yyy :e=f,zzz:www:g= h,i=j,l=m

I would want four matches from this:
1. xxx a=b,c=d
2. yyy e=f
3. zzz (empty)
4. www g=h,i=j,l=m

None of the letters here are single letters, but rather placeholders for
arbitrary words. For example,

LTG:LTG=2-41-53-57,JOB:JN=113&& 116&125&&127,CP T:CODE=09789,TR ATYP=AMBINC-7-A
MBINC/CPTGRP-0-CPTGRP

Would result in:
1. LTG LTG=2-41-53-57
2. JOB JN=113&&116&125 &&127
3. CPT CODE=09789,TRAT YP=AMBINC-7-AMBINC/CPTGRP-0-CPTGRP

Everything I've come up with so far would require me to iterate over
substrings. It'd be nice to have just a single matching operation. TIA.

-- Alan


--
Composed with Newz Crawler 1.4 http://www.newzcrawler.com/
Nov 15 '05 #4
How about?
(\w+):([^:]+)?,(\w+):([^:]+)?,(\w+):([^:]+)?

Go to http://www.organicbit.com/regex/fog0000000019.html and get the regex
tool, it's handy for building these things.

The tool helps when you are coding the regex, but it is cumbersome when you
want to verify the correctness of the regex and match, across a large set of
input. For this you would be better off with a unit test app, where you
store an array of (input,output) pairs. Then run the regex on each input
and compare it to the expected output. (Example below)

-Dino
//
// emailValidation .cs
//
// uses a regexp to validate emails.
// This test program uses xml serialization to get the test input,
// including the regexp string and the various emails to test.
//
// references:
// http://homepage.stts.edu/~agushen/sc...alidation.html
//
// Fri, 15 Aug 2003 11:28
//

using Ionic.Test.Emai lValidation;

namespace Ionic.Test.Emai lValidation {

/// <remarks>
/// Represents all the input for the test, including the regex to test,
/// and an array of test cases.
/// </remarks>
[System.Xml.Seri alization.XmlRo otAttribute("Em ail.Validation. Input",
Namespace="", IsNullable=fals e)]
public class TestInput {

/// <remarks/>

[System.Xml.Seri alization.XmlEl ementAttribute( Form=System.Xml .Schema.XmlSche
maForm.Unqualif ied)]
public string Regexp;

/// <remarks/>

[System.Xml.Seri alization.XmlAr rayAttribute(Fo rm=System.Xml.S chema.XmlSchema
Form.Unqualifie d)]
[System.Xml.Seri alization.XmlAr rayItemAttribut e("Case",
Form=System.Xml .Schema.XmlSche maForm.Unqualif ied, IsNullable=fals e)]
public TestCase[] TestList;
}
/// <remarks>
/// This is the type that stores a single test case.
/// We need a bunch of these to verify that the regex works as
/// expected. Each test case has an input and an output. In our
/// case, the input is a string, and the output is a bool value,
/// which indicates whether the Regex should match or not.
/// Other tests will have different input and output.
/// </remarks>
public class TestCase {

/// <remarks/>

[System.Xml.Seri alization.XmlEl ementAttribute( Form=System.Xml .Schema.XmlSche
maForm.Unqualif ied)]
public string Input;

/// <remarks/>

[System.Xml.Seri alization.XmlEl ementAttribute( Form=System.Xml .Schema.XmlSche
maForm.Unqualif ied)]
public bool ExpectedOutput;
}
/// <remarks>
/// This is the test app. The main routine de-serializes from
/// an XML file, then runs the tests, comparing the expected
/// (or desired) output with the actual result.
/// </remarks>
public class Tester {

public static void Main() {
string InputPath= "EmailValidatio nInput.xml";

System.IO.FileS tream fs = new System.IO.FileS tream(InputPath ,
System.IO.FileM ode.Open);
System.Xml.Seri alization.XmlSe rializer s= new
System.Xml.Seri alization.XmlSe rializer(typeof (TestInput));
TestInput Input= (TestInput) s.Deserialize(f s);
fs.Close();

System.Text.Reg ularExpressions .Regex regex= new
System.Text.Reg ularExpressions .Regex (Input.Regexp);

foreach (TestCase tc in Input.TestList) {
System.Console. WriteLine(tc.In put +"\n " + tc.ExpectedOutp ut + " \\ " +
regex.IsMatch(t c.Input));
}
}
}
}
This is input data. Store this in the XML file that is de-serialized for
this test.

<Email.Validati on.Input>
<TestList>
<!--
=============== =============== =============== =============== ====== -->
<!-- =============== ==== True test cases
=============== =============== -->
<!--
=============== =============== =============== =============== ====== -->

<Case>
<Input>Ro***@ra bbit.com</Input>
<ExpectedOutput >true</ExpectedOutput>
</Case>
<Case>
<Input>th****** *************** ************@so mething.org</Input>
<ExpectedOutput >true</ExpectedOutput>
</Case>
<Case>
<Input>th****** *@something.9g</Input>
<ExpectedOutput >true</ExpectedOutput>
</Case>
<Case>
<Input>th****** *@place.org</Input>
<ExpectedOutput >true</ExpectedOutput>
</Case>
<Case>
<Input>We****** *****@cornell.e du</Input>
<ExpectedOutput >true</ExpectedOutput>
</Case>
<Case>
<Input>Ja****** *****@sun-east.com</Input>
<ExpectedOutput >true</ExpectedOutput>
</Case>
<Case>
<Input>Ja****** *****@sun.east. com</Input>
<ExpectedOutput >true</ExpectedOutput>
</Case>
<Case>
<Input>Ja****** *****@sun.com</Input>
<ExpectedOutput >true</ExpectedOutput>
</Case>
<Case>
<Input>Pr****** *@rolling-hills.club.org</Input>
<ExpectedOutput >true</ExpectedOutput>
</Case>
<Case>
<Input>9L****@c lub.org</Input>
<ExpectedOutput >true</ExpectedOutput>
</Case>
<Case>
<Input>fr**@som ewhere.org9</Input>
<ExpectedOutput >true</ExpectedOutput>
</Case>
<Case>
<Input>f@z.k</Input>
<ExpectedOutput >true</ExpectedOutput>
</Case>
<Case>
<Input>_e***@se same.org</Input>
<ExpectedOutput >true</ExpectedOutput>
</Case>
<Case>
<Input>Ha****** ****@Hogwarts.e du</Input>
<ExpectedOutput >true</ExpectedOutput>
</Case>
<Case>
<Input>Pr****** *************** ***@Faculty.Hog warts.edu</Input>
<ExpectedOutput >true</ExpectedOutput>
</Case>

<!--
=============== =============== =============== =============== ====== -->
<!-- =============== ==== False test cases
=============== ============== -->
<!--
=============== =============== =============== =============== ====== -->

<Case>
<Input>-e***@sesame.org </Input>
<ExpectedOutput >false</ExpectedOutput>
</Case>
<Case>
<Input>el**@ses ame.org.</Input>
<ExpectedOutput >false</ExpectedOutput>
</Case>
<Case>
<Input>-e***@sesame.org .</Input>
<ExpectedOutput >false</ExpectedOutput>
</Case>
<Case>
<Input>elmo@.or g.</Input>
<ExpectedOutput >false</ExpectedOutput>
</Case>
<Case>
<Input>elmo@.or g</Input>
<ExpectedOutput >false</ExpectedOutput>
</Case>
<Case>
<Input>elmo@.so meplace.org</Input>
<ExpectedOutput >false</ExpectedOutput>
</Case>
<Case>
<Input>elmo@clo ud9</Input>
<ExpectedOutput >false</ExpectedOutput>
</Case>
<Case>
<Input>fred.@so mewhere.org9</Input>
<ExpectedOutput >false</ExpectedOutput>
</Case>
<Case>
<Input>fred@som ewhere..org9</Input>
<ExpectedOutput >false</ExpectedOutput>
</Case>
<Case>
<Input>9Lives.c lub.org</Input>
<ExpectedOutput >false</ExpectedOutput>
</Case>
<Case>
<Input>@club.or g</Input>
<ExpectedOutput >false</ExpectedOutput>
</Case>
<Case>
<Input>.so***** @club.org</Input>
<ExpectedOutput >false</ExpectedOutput>
</Case>

</TestList>
<Regexp>^(\w([\.\-\w]*\w)?)@(\w([\.\-\w]*\w)*\.\w([\.\-\w]*\w)?)$</Regexp>
</Email.Validatio n.Input>

"Alan Pretre" <no@spam> wrote in message
news:ep******** ******@TK2MSFTN GP09.phx.gbl...
Can anyone help me figure out a regex pattern for the following input
example:

xxx:a=b,c=d,yyy :e=f,zzz:www:g= h,i=j,l=m

I would want four matches from this:
1. xxx a=b,c=d
2. yyy e=f
3. zzz (empty)
4. www g=h,i=j,l=m

None of the letters here are single letters, but rather placeholders for
arbitrary words. For example,

LTG:LTG=2-41-53-57,JOB:JN=113&& 116&125&&127,CP T:CODE=09789,TR ATYP=AMBINC-7-A MBINC/CPTGRP-0-CPTGRP

Would result in:
1. LTG LTG=2-41-53-57
2. JOB JN=113&&116&125 &&127
3. CPT CODE=09789,TRAT YP=AMBINC-7-AMBINC/CPTGRP-0-CPTGRP

Everything I've come up with so far would require me to iterate over
substrings. It'd be nice to have just a single matching operation. TIA.

-- Alan

Nov 15 '05 #5
"Dino Chiesa [MSFT]" <di****@microso ft.com> wrote in message
news:uU******** ******@tk2msftn gp13.phx.gbl...
How about?
(\w+):([^:]+)?,(\w+):([^:]+)?,(\w+):([^:]+)?


Dino,

Your regex fails (no match) with a simple test, CMD:PARM=X, and I didn't
have much luck with others I tried. For example, my OP had this example,

LTG:LTG=2-41-53-57,JOB:JN=113&& 116&125&&127,CP T:CODE=09789,TR ATYP=AMBINC-7-A
MBINC/CPTGRP-0-CPTGRP

Your regex gives this result:
1 matches.
Match 1 has 7 groups.
Group 1 =
"LTG:LTG=2-41-53-57,JOB:JN=113&& 116&125&&127,CP T:CODE=09789,TR ATYP=AMBINC-7-
AMBINC/CPTGRP-0-CPTGRP"
Group 2 = "LTG"
Group 3 = "LTG=2-41-53-57"
Group 4 = "JOB"
Group 5 = "JN=113&&116&12 5&&127"
Group 6 = "CPT"
Group 7 = "CODE=09789,TRA TYP=AMBINC-7-AMBINC/CPTGRP-0-CPTGRP"

But I was looking for something more along the lines of (Group 2 & 3 in each
match are the desired values):
3 matches.
Match 1 has 3 groups.
Group 1 = "LTG:LTG=2-41-53-57"
Group 2 = "LTG"
Group 3 = "LTG=2-41-53-57"
Match 2 has 3 groups.
Group 1 = "JOB:JN=113&&11 6&125&&127"
Group 2 = "JOB"
Group 3 = "JN=113&&116&12 5&&127"
Match 3 has 3 groups.
Group 1 = "CPT:CODE=09789 ,TRATYP=AMBINC-7-AMBINC/CPTGRP-0-CPTGRP"
Group 2 = "CPT"
Group 3 = "CODE=09789,TRA TYP=AMBINC-7-AMBINC/CPTGRP-0-CPTGRP"

But thanks for your advice. I will study what you supplied to try to
understand it as well. Thanks!

-- Alan
Nov 15 '05 #6
Try the following:

Regex regex = new Regex(@"
( # overall repetition
(?<Item> # Capture to item
(?<Tag>.*?) # Any character, one or more times, non-greedy
: # literal :
.*? # any character, one or more times, non-greedy
) # end of capture
,? # optional "","". This eats the comma between the Items
(?= # optional zero-width lookahead. This must match at this
spot
(\w+: # one or more word characters, followed by a literal :
| # or
$ # end of line
)
)
)+ # one or more times",
RegexOptions.Ex plicitCapture |
RegexOptions.Co mpiled |
RegexOptions.Si ngleline |
RegexOptions.Ig norePatternWhit espace);

The key to this is the zero-width lookahead. It ensures that the part after
the match is either <xxx>:, or the end of the string, without eating any of
the characters. As you've probably found, without this there's no way to
know whether you should include a comma or break on it.

Here's the output I get from my regex workbench:

Matching:
LTG:LTG=2-41-53-57,JOB:JN=113&& 116&125&&127,CP T:CODE=09789,TR ATYP=AMBINC-7-A
MBINC/CPTGRP-0-CPTGRP
Item => LTG:LTG=2-41-53-57
Item => JOB:JN=113&&116 &125&&127
Item => CPT:CODE=09789, TRATYP=AMBINC-7-AMBINC/CPTGRP-0-CPTGRP
Tag => LTG
Tag => JOB
Tag => CPT

--
Eric Gunnerson

Visit the C# product team at http://www.csharp.net
Eric's blog is at http://blogs.gotdotnet.com/ericgu/

This posting is provided "AS IS" with no warranties, and confers no rights.
"Alan Pretre" <no@spam> wrote in message
news:ep******** ******@TK2MSFTN GP09.phx.gbl...
Can anyone help me figure out a regex pattern for the following input
example:

xxx:a=b,c=d,yyy :e=f,zzz:www:g= h,i=j,l=m

I would want four matches from this:
1. xxx a=b,c=d
2. yyy e=f
3. zzz (empty)
4. www g=h,i=j,l=m

None of the letters here are single letters, but rather placeholders for
arbitrary words. For example,

LTG:LTG=2-41-53-57,JOB:JN=113&& 116&125&&127,CP T:CODE=09789,TR ATYP=AMBINC-7-A MBINC/CPTGRP-0-CPTGRP

Would result in:
1. LTG LTG=2-41-53-57
2. JOB JN=113&&116&125 &&127
3. CPT CODE=09789,TRAT YP=AMBINC-7-AMBINC/CPTGRP-0-CPTGRP

Everything I've come up with so far would require me to iterate over
substrings. It'd be nice to have just a single matching operation. TIA.

-- Alan

Nov 15 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
2392
by: Alan Pretre | last post by:
Can anyone help me figure out a regex pattern for the following input example: xxx:a=b,c=d,yyy:e=f,zzz:www:g=h,i=j,l=m I would want four matches from this: 1. xxx a=b,c=d 2. yyy e=f 3. zzz (empty) 4. www g=h,i=j,l=m
8
1480
by: G. Stewart | last post by:
The objective is to extract the first n characters of text from an HTML block. I wish to preserve all HTML (links, formatting etc.), and at the same time, extend the size of the block to ensure that all closing tags are recovered. For example, simply extracting the first 400 characters of a HTML block may result in an <i> opening tag being...
1
13079
by: xavier vazquez | last post by:
I have a problem with a program that does not working properly...when the program run is suppose to generate a cross word puzzle , when the outcome show the letter of the words overlap one intop of the other....how i can fix this the program look like this import java.util.ArrayList; import java.util.Random;
1
1785
by: Tom | last post by:
A puzzle for you regular expression wizards out there. Looking for a regex that will split up a string like the following on any pipe (|) not inside brackets: a b | a { b |{c | cd}} d | a b c Correct result would be: array ( ='a b', ='a { b |{c | cd}} d', ='a b c')
4
19817
by: honey777 | last post by:
Problem: 15 Puzzle This is a common puzzle with a 4x4 playing space with 15 tiles, numbered 1 through 15. One "spot" is always left blank. Here is an example of the puzzle: The goal is to get the tiles in order, 1 through 15, from left to right, top to bottom, by just sliding tiles into the empty square. In this configuration, the goal...
0
7698
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7612
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
7924
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
0
5219
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3653
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
0
3640
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2113
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1213
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
937
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.