By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
454,975 Members | 1,035 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 454,975 IT Pros & Developers. It's quick & easy.

regular expression help

P: n/a
Hi, I'm not sure that this is the right forum for this, but I've been having
a very tough time completing this expression, and I was hoping someone might
have some suggestions for me.
I am trying to read measurements out of a text description, and I have a
working expression, but it captures a pile of empty matches. I obviously am
not interested in them, but I screw up my functionality when I try to get
rid of them.

My expression is:
(?:(?:(?<Feet>[0-9]*)\'){0,1}(?:(?:(?<WholeInches>[0-9]*(?![/\w])){0,1}(?:[
,\-]){0,1}(?<Fraction>[0-9]*\/[0-9]*){0,1}(?<Decimal>\d*\.\d*){0,1}\")){0,1})

Some test strings are:
1/4" x 2" Flat 44W x 20'
1 1/4" x 2" Flat 44W x 20'
1/4" x 2.5" Flat 44W x 20'
1/4" x 2" Flat 44W x 20' 3"
1/4" x 2" Flat 44W x 20' 3.5"
1/4" x 2" Flat 44W x 20' 1/2"
1/8" x 4" C-1018 flat x 14' 5-1/4"

I really could use some help on this. I've been working on this on and off
for several months now, and just can't seem to get it right.
Dec 10 '05 #1
Share this Question
Share on Google+
5 Replies


P: n/a
Sorry, it's been a hectic day... I didn't finish my post, but somehow
managed to send it anyway....

In the strings, I there are always random numbers, and I want them ignored.
I only want matches on the measurements which can be written about a million
different ways. This is for pulling data out of a legacy inventory
application.

Any thoughts or suggestions would be very, very much appreciated. Right
now, my app uses this expression, and removes matches to the empty groups,
but this is just not how it should work.

Thanks,
Trevor_B
Dec 10 '05 #2

P: n/a
Trevor Braun wrote:
Sorry, it's been a hectic day... I didn't finish my post, but somehow
managed to send it anyway....

In the strings, I there are always random numbers, and I want them ignored.
I only want matches on the measurements which can be written about a million
different ways. This is for pulling data out of a legacy inventory
application.

Any thoughts or suggestions would be very, very much appreciated. Right
now, my app uses this expression, and removes matches to the empty groups,
but this is just not how it should work.

Thanks,
Trevor_B


shoot me an email and i'll work with you on these. there's no need to
flood a C# newsgroup with a bunch of back and forth messages about
regular expressions, when they're just between you and me.

send me a long list of the test strings and i'll see what i can do for
you. i've never written a regular expression this complicated and i
would love to give it a try.

jeremiah
Dec 11 '05 #3

P: n/a
I disagree... regular expressions are fun.

-Marc N.
"jeremiah johnson" <na*******@gmail.com> wrote in message
news:uk**************@TK2MSFTNGP09.phx.gbl...
Trevor Braun wrote:
Sorry, it's been a hectic day... I didn't finish my post, but somehow
managed to send it anyway....

In the strings, I there are always random numbers, and I want them
ignored. I only want matches on the measurements which can be written
about a million different ways. This is for pulling data out of a legacy
inventory application.

Any thoughts or suggestions would be very, very much appreciated. Right
now, my app uses this expression, and removes matches to the empty
groups, but this is just not how it should work.

Thanks,
Trevor_B


shoot me an email and i'll work with you on these. there's no need to
flood a C# newsgroup with a bunch of back and forth messages about regular
expressions, when they're just between you and me.

send me a long list of the test strings and i'll see what i can do for
you. i've never written a regular expression this complicated and i would
love to give it a try.

jeremiah

Dec 11 '05 #4

P: n/a
Hey trevor,

It maybe easier to write multiple regex strings than one large regex
string capable of handling all situations. There is always going to be
a legacy string that will fail your regex. So, instead have a set of
regex strings that you will loop through and try to match. If no match
is found, then you know you need to create a new regex.

It's like a bunch of security check points. If it fails one, then it
goes through another checkpoint. Having one large centralized
checkpoint can cause a lot of complications.

Give it a whirl because sometimes it's easier to have a bunch of little
tasks than one large complicated task.

josh

Dec 11 '05 #5

P: n/a
In article <Om*************@TK2MSFTNGP12.phx.gbl>,
Trevor Braun <tb***********@codetrue.com> wrote:

: Hi, I'm not sure that this is the right forum for this, but I've been
: having a very tough time completing this expression, and I was hoping
: someone might have some suggestions for me.
: I am trying to read measurements out of a text description, and I have
: a working expression, but it captures a pile of empty matches. I
: obviously am not interested in them, but I screw up my functionality
: when I try to get rid of them.
:
: My expression is:
: [snipped]
:
: Some test strings are:
: 1/4" x 2" Flat 44W x 20'
: 1 1/4" x 2" Flat 44W x 20'
: 1/4" x 2.5" Flat 44W x 20'
: 1/4" x 2" Flat 44W x 20' 3"
: 1/4" x 2" Flat 44W x 20' 3.5"
: 1/4" x 2" Flat 44W x 20' 1/2"
: 1/8" x 4" C-1018 flat x 14' 5-1/4"
:
: I really could use some help on this. I've been working on this on and
: off for several months now, and just can't seem to get it right.

One easy suggestion is that you can write "{0,1}" more succinctly as
"?", e.g., "a{0,1}" and "a?" are equivalent.

If you want to insist that one of the groups matches, then say what
you mean. Remember that the ? and * quantifiers *always* succeed
because they can match nothing.

For complex patterns, I like to use IgnorePatternWhitespace

Your subpatterns are inconsistent, e.g., some included the unit and
some didn't, and even with your followup, I may not be clear on what
you're trying to capture.

Take a look at the code below. Note how the pattern requires one of
the alternatives to match non-empty strings.

static void Main(string[] args)
{
Regex measurements = new Regex(
@"
(?<Fraction> (\d+\s+)?\d+/\d+"" ) |
(?<Decimal> \d+\.\d+"" ) |
(?<Feet> \d+' ) |
(?<WholeInches> \d+(?![/\w]) )
",
RegexOptions.IgnorePatternWhitespace |
RegexOptions.ExplicitCapture);

string[] inputs = {
"1/4\" x 2\" Flat 44W x 20'",
"1 1/4\" x 2\" Flat 44W x 20'",
"1/4\" x 2.5\" Flat 44W x 20'",
"1/4\" x 2\" Flat 44W x 20' 3\"",
"1/4\" x 2\" Flat 44W x 20' 3.5\"",
"1/4\" x 2\" Flat 44W x 20' 1/2\"",
"1/8\" x 4\" C-1018 flat x 14' 5-1/4\"",
};

string[] groups = {
"Feet", "WholeInches", "Fraction", "Decimal",
};

foreach (string input in inputs)
{
Console.WriteLine("[" + input + "]:");

int count = 1;
foreach (Match m in measurements.Matches(input))
{
Console.WriteLine(" - {0}:", count++);

foreach (string group in groups)
Console.WriteLine(" - {0}: [{1}]",
group, m.Groups[group].Value);
}
}
}

Is it at least a start in the right direction? Should an input such
as [20' 3"] produce one match or two (one for the feet component and
one for the inches component)? What else needs fixing?

I agree with Mark Noon: regular expressions are fun, so I look forward
to hearing back from you.

Hope this helps,
Greg
--
"Those who deliberately sign their names to deception will be punished,"
[President Bush] said, leaving out that this is precisely what happens
every time he signs a budget or a law, or Congress votes.
-- Lew Rockwell
Dec 14 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.