Connecting Tech Pros Worldwide Forums | Help | Site Map

A better RE?

Magnus Lycka
Guest
 
Posts: n/a
#1: Mar 9 '06
I want an re that matches strings like "21MAR06 31APR06 1236",
where the last part is day numbers (1-7), i.e it can contain
the numbers 1-7, in order, only one of each, and at least one
digit. I want it as three groups. I was thinking of

r"(\d\d[A-Z]\d\d) (\d\d[A-Z]\d\d) (1?2?3?4?5?6?7?)"

but that will match even if the third group is empty,
right? Does anyone have good and not overly complex RE for
this?

P.S. I know the "now you have two problems reply..."

Fredrik Lundh
Guest
 
Posts: n/a
#2: Mar 9 '06

re: A better RE?


Magnus Lycka wrote:
[color=blue]
> I want an re that matches strings like "21MAR06 31APR06 1236",
> where the last part is day numbers (1-7), i.e it can contain
> the numbers 1-7, in order, only one of each, and at least one
> digit. I want it as three groups. I was thinking of
>
> r"(\d\d[A-Z]\d\d) (\d\d[A-Z]\d\d) (1?2?3?4?5?6?7?)"
>
> but that will match even if the third group is empty,
> right? Does anyone have good and not overly complex RE for
> this?[/color]

how about (untested)

r"(\d\d[A-Z]{3}\d\d) (\d\d[A-Z]{3}\d\d) (?=[1234567])(1?2?3?4?5?6?7?)"

where {3} means require three copies of the previous RE part, and
(?=[1234567]) means require at least one of 1-7, but don't move
forward if it matches.

</F>



Schüle Daniel
Guest
 
Posts: n/a
#3: Mar 9 '06

re: A better RE?


Magnus Lycka wrote:[color=blue]
> I want an re that matches strings like "21MAR06 31APR06 1236",
> where the last part is day numbers (1-7), i.e it can contain
> the numbers 1-7, in order, only one of each, and at least one
> digit. I want it as three groups. I was thinking of
>
> r"(\d\d[A-Z]\d\d) (\d\d[A-Z]\d\d) (1?2?3?4?5?6?7?)"
>
> but that will match even if the third group is empty,
> right? Does anyone have good and not overly complex RE for
> this?
>
> P.S. I know the "now you have two problems reply..."[/color]
[color=blue][color=green][color=darkred]
>>> txt = "21MAR06 31APR06 1236"[/color][/color][/color]
[color=blue][color=green][color=darkred]
>>> m = '(?:JAN|FEB|MAR|APR|MAI|JUN|JUL|AUG|SEP|OCT|NOV|DE Z)'[/color][/color][/color]
# non capturing group (:?)
[color=blue][color=green][color=darkred]
>>> p = re.compile(r"(\d\d%s\d\d) (\d\d%s\d\d)[/color][/color][/color]
(?=[1234567])(1?2?3?4?5?6?7?)" % (m,m))
[color=blue][color=green][color=darkred]
>>> p.match(txt).group(1)[/color][/color][/color]
'21MAR06'
[color=blue][color=green][color=darkred]
>>> p.match(txt).group(2)[/color][/color][/color]
'31APR06'
[color=blue][color=green][color=darkred]
>>> p.match(txt).group(3)[/color][/color][/color]
1236

bruno at modulix
Guest
 
Posts: n/a
#4: Mar 10 '06

re: A better RE?


Magnus Lycka wrote:[color=blue]
> I want an re that matches strings like "21MAR06 31APR06 1236",
> where the last part is day numbers (1-7), i.e it can contain
> the numbers 1-7, in order, only one of each, and at least one
> digit. I want it as three groups. I was thinking of
>
> r"(\d\d[A-Z]\d\d) (\d\d[A-Z]\d\d) (1?2?3?4?5?6?7?)"
>
> but that will match even if the third group is empty,
> right? Does anyone have good and not overly complex RE for
> this?[/color]

Simplest:
[color=blue][color=green][color=darkred]
>>> exp = r"(\d{2}[A-Z]{3}\d{2}) (\d{2}[A-Z]{3}\d{2}) (\d+)"
>>> re.match(exp, s).groups()[/color][/color][/color]
('21MAR06', '31APR06', '1236')

but this could give you false positive, depending on the real data.

If you want to be as strict as possible, this becomes a little bit hairy.
[color=blue]
> P.S. I know the "now you have two problems reply..."[/color]

!-)

--
bruno desthuilliers
python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in 'onurb@xiludom.gro'.split('@')])"
Eddie Corns
Guest
 
Posts: n/a
#5: Mar 10 '06

re: A better RE?


Magnus Lycka <lycka@carmen.se> writes:
[color=blue]
>I want an re that matches strings like "21MAR06 31APR06 1236",
>where the last part is day numbers (1-7), i.e it can contain
>the numbers 1-7, in order, only one of each, and at least one
>digit. I want it as three groups. I was thinking of[/color]

Just a small point - what does "in order" mean here? if it means that eg 1362
is not valid then you're stuck because it's context sensitive and hence not
regular.

I can't see how any of the fancy extensions could help here but maybe I'm just
lacking insight.

Now if "[\1-7]" worked you'd be home and dry.

Eddie
Fredrik Lundh
Guest
 
Posts: n/a
#6: Mar 10 '06

re: A better RE?


Eddie Corns wrote:

[color=blue][color=green]
> >I want an re that matches strings like "21MAR06 31APR06 1236",
> >where the last part is day numbers (1-7), i.e it can contain
> >the numbers 1-7, in order, only one of each, and at least one
> >digit. I want it as three groups. I was thinking of[/color]
>
> Just a small point - what does "in order" mean here? if it means that eg 1362
> is not valid then you're stuck because it's context sensitive and hence not
> regular.
>
> I can't see how any of the fancy extensions could help here but maybe I'm
> just lacking insight.[/color]

import re

p = re.compile("(?=[1234567])(1?2?3?4?5?6?7?)$")

def test(s):
m = p.match(s)
print repr(s), "=>", m and m.groups() or "none"

test("")
test("1236")
test("1362")
test("12345678")

prints

'' => none
'1236' => ('1236',)
'1362' => none
'12345678' => none

</F>



Jim
Guest
 
Posts: n/a
#7: Mar 10 '06

re: A better RE?



Eddie Corns wrote:[color=blue]
> Just a small point - what does "in order" mean here? if it means that eg 1362
> is not valid then you're stuck because it's context sensitive and hence not
> regular.[/color]
I'm not seeing that. Any finite language is regular -- as a last
resort you could list all ascending sequences of 7 or fewer digits (but
perhaps I misunderstood the original poster's requirements).

Jim

Eddie Corns
Guest
 
Posts: n/a
#8: Mar 10 '06

re: A better RE?


"Fredrik Lundh" <fredrik@pythonware.com> writes:
[color=blue]
>Eddie Corns wrote:[/color]

[color=blue][color=green][color=darkred]
>> >I want an re that matches strings like "21MAR06 31APR06 1236",
>> >where the last part is day numbers (1-7), i.e it can contain
>> >the numbers 1-7, in order, only one of each, and at least one
>> >digit. I want it as three groups. I was thinking of[/color]
>>
>> Just a small point - what does "in order" mean here? if it means that eg 1362
>> is not valid then you're stuck because it's context sensitive and hence not
>> regular.
>>
>> I can't see how any of the fancy extensions could help here but maybe I'm
>> just lacking insight.[/color][/color]
[color=blue]
>import re[/color]
[color=blue]
>p = re.compile("(?=[1234567])(1?2?3?4?5?6?7?)$")[/color]
[color=blue]
>def test(s):
> m = p.match(s)
> print repr(s), "=>", m and m.groups() or "none"[/color]
[color=blue]
>test("")
>test("1236")
>test("1362")
>test("12345678")[/color]
[color=blue]
>prints[/color]
[color=blue]
>'' => none
>'1236' => ('1236',)
>'1362' => none
>'12345678' => none[/color]
[color=blue]
></F>[/color]

I know I know! I cancelled the article about a minute after posting it.

Eddie
Eddie Corns
Guest
 
Posts: n/a
#9: Mar 10 '06

re: A better RE?


"Jim" <jhefferon@smcvt.edu> writes:

[color=blue]
>Eddie Corns wrote:[color=green]
>> Just a small point - what does "in order" mean here? if it means that eg 1362
>> is not valid then you're stuck because it's context sensitive and hence not
>> regular.[/color]
>I'm not seeing that. Any finite language is regular -- as a last
>resort you could list all ascending sequences of 7 or fewer digits (but
>perhaps I misunderstood the original poster's requirements).[/color]

No, that's what I did. Just carelessnes on my part, time I had a holiday!

Eddie

Paul McGuire
Guest
 
Posts: n/a
#10: Mar 10 '06

re: A better RE?


"Magnus Lycka" <lycka@carmen.se> wrote in message
news:duq0cj$7ih$1@wake.carmen.se...[color=blue]
> I want an re that matches strings like "21MAR06 31APR06 1236",
> where the last part is day numbers (1-7), i.e it can contain
> the numbers 1-7, in order, only one of each, and at least one
> digit. I want it as three groups. I was thinking of
>
> r"(\d\d[A-Z]\d\d) (\d\d[A-Z]\d\d) (1?2?3?4?5?6?7?)"
>
> but that will match even if the third group is empty,
> right? Does anyone have good and not overly complex RE for
> this?
>
> P.S. I know the "now you have two problems reply..."[/color]

For the pyparsing-inclined, here are two versions, along with several
examples on how to extract the fields from the returned ParseResults object.
The second version is more rigorous in enforcing the days-of-week rules on
the 3rd field.

Note that the month field is already limited to valid month abbreviations,
and the same technique used to validate the days-of-week field could be used
to ensure that the date fields are valid dates (no 31st of FEB, etc.), that
the second date is after the first, etc.

-- Paul
Download pyparsing at http://pyparsing.sourceforge.net.


data = "21MAR06 31APR06 1236"
data2 = "21MAR06 31APR06 1362"

from pyparsing import *

# define format of an entry
month = oneOf("JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC")
date = Combine( Word(nums,exact=2) + month + Word(nums,exact=2) )
daysOfWeek = Word("1234567")
entry = date.setResultsName("startDate") + \
date.setResultsName("endDate") + \
daysOfWeek.setResultsName("weekDays") + \
lineEnd

# extract entry data
e = entry.parseString(data)

# various ways to access the results
print e.startDate, e.endDate, e.weekDays
print "%(startDate)s : %(endDate)s : %(weekDays)s" % e
print e.asList()
print e
print

# get more rigorous in testing for valid days of week field
def rigorousDayOfWeekTest(s,l,toks):
# remove duplicates from toks[0], sort, then compare to original
tmp = "".join(sorted(dict([(ll,0) for ll in toks[0]]).keys()))
if tmp != toks[0]:
raise ParseException(s,l,"Invalid days of week field")

daysOfWeek.setParseAction(rigorousDayOfWeekTest)
entry = date.setResultsName("startDate") + \
date.setResultsName("endDate") + \
daysOfWeek.setResultsName("weekDays") + \
lineEnd

print entry.parseString(data)
print entry.parseString(data2) # <-- raises ParseException


Magnus Lycka
Guest
 
Posts: n/a
#11: Mar 10 '06

re: A better RE?


Fredrik Lundh wrote:[color=blue]
> Magnus Lycka wrote:
> r"(\d\d[A-Z]{3}\d\d) (\d\d[A-Z]{3}\d\d) (?=[1234567])(1?2?3?4?5?6?7?)"
>[/color]

Thanks a lot. (I knew about {3} of course, I was in a hurry
when I posted since I was close to missing my train...)
Magnus Lycka
Guest
 
Posts: n/a
#12: Mar 10 '06

re: A better RE?


Schüle Daniel wrote:[color=blue][color=green][color=darkred]
> >>> txt = "21MAR06 31APR06 1236"[/color][/color]
>[color=green][color=darkred]
> >>> m = '(?:JAN|FEB|MAR|APR|MAI|JUN|JUL|AUG|SEP|OCT|NOV|DE Z)'[/color][/color]
> # non capturing group (:?)
>[color=green][color=darkred]
> >>> p = re.compile(r"(\d\d%s\d\d) (\d\d%s\d\d)[/color][/color]
> (?=[1234567])(1?2?3?4?5?6?7?)" % (m,m))
>[color=green][color=darkred]
> >>> p.match(txt).group(1)[/color][/color]
> '21MAR06'
>[color=green][color=darkred]
> >>> p.match(txt).group(2)[/color][/color]
> '31APR06'
>[color=green][color=darkred]
> >>> p.match(txt).group(3)[/color][/color]
> 1236
>[/color]

Excellent. Thanks!
Closed Thread