469,613 Members | 1,286 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,613 developers. It's quick & easy.

regex matching question

first, regex part:

I am new to regexes and have come up with the following expression:
((1[0-4]|[1-9]),(1[0-4]|[1-9])/){5}(1[0-4]|[1-9]),(1[0-4]|[1-9])

to exactly match strings which look like this:

1,2/3,4/5,6/7,8/9,10/11,12

i.e. 6 comma-delimited pairs of integer numbers separated by the
backslash character + constraint that numbers must be in range 1-14.

i should add that i am only interested in finding exact matches (doing
some kind of command line validation).

this seems to work fine, although i would welcome any advice about how
to shorten the above. it seems to me that there should exist some
shorthand for (1[0-4]|[1-9]) once i have defined it once?

also (and this is where my total beginner status brings me here
looking for help :)) i would like to add one more constraint to the
above regex. i want to match strings *iff* each pair of numbers are
different. e.g: 1,1/3,4/5,6/7,8/9,10/11,12 or
1,2/3,4/5,6/7,8/9,10/12,12 should fail to be matched by my final
regex whereas 1,2/3,4/5,6/7,8/9,10/11,12 should match OK.

any tips would be much appreciated - especially regarding preceding
paragraph!

and now for the python part:

results = "1,2/3,4/5,6/7,8/9,10/11,12"
match = re.match("((1[0-4]|[1-9]),(1[0-4]|[1-9])/){5}(1[0-4]|[1-9]),
(1[0-4]|[1-9])", results)
if match == None or match.group(0) != results:
raise FormatError("Error in format of input string: %s" %
(results))
results = [leg.split(',') for leg in results.split('/')]
# =[['1', '2'], ['3', '4'], ['5', '6'], ['7', '8'], ['9', '10'],
['11', '12']]
..
..
..
the idea in the above code being that i want to use the regex match as
a test of whether or not the input string (results) is correctly
formatted. if the string results is not exactly matched by the regex,
i want my program to barf an exception and bail out. apart from
whether or not the regex is good idiom, is my approach suitably
pythonic?

TIA for any help here.

May 19 '07 #1
10 1667
In <11**********************@l77g2000hsb.googlegroups .com>,
bullockbefriending bard wrote:
first, regex part:

I am new to regexes and have come up with the following expression:
((1[0-4]|[1-9]),(1[0-4]|[1-9])/){5}(1[0-4]|[1-9]),(1[0-4]|[1-9])

to exactly match strings which look like this:

1,2/3,4/5,6/7,8/9,10/11,12

i.e. 6 comma-delimited pairs of integer numbers separated by the
backslash character + constraint that numbers must be in range 1-14.

i should add that i am only interested in finding exact matches (doing
some kind of command line validation).

[…]

the idea in the above code being that i want to use the regex match as
a test of whether or not the input string (results) is correctly
formatted. if the string results is not exactly matched by the regex,
i want my program to barf an exception and bail out. apart from
whether or not the regex is good idiom, is my approach suitably
pythonic?
I would use a simple regular expression to extract "candidates" and a
Python function to split the candidate and check for the extra
constraints. Especially the "all pairs different" constraint is something
I would not even attempt to put in a regex. For searching candidates this
should be good enough::

r'(\d+,\d+/){5}\d+,\d+'

Ciao,
Marc 'BlackJack' Rintsch
May 19 '07 #2
thanks for your suggestion. i had already implemented the all pairs
different constraint in python code. even though i don't really need
to give very explicit error messages about what might be wrong with my
data (obviously easier to do if do all constraint validation in code
rather than one regex), there is something to be said for your
suggestion to simplify my regex further - it might be sensible from a
maintainability/readability perspective to use regex for *format*
validation and then validate all *values* in code.

from my cursory skimming of friedl, i get the feeling that the all
pairs different constraint would give rise to some kind of fairly
baroque expression, perhaps likely to bring to mind the following
quotation from samuel johnson:

"Sir, a woman's preaching is like a dog's walking on his hind legs.
It is not done well; but you are surprised to find it done at all."

however, being human, sometimes some things should be done, just
because they can :)... so if anyone knows hows to do it, i'm still
interested, even if just out of idle curiosity!

On May 20, 12:57 am, Marc 'BlackJack' Rintsch <bj_...@gmx.netwrote:
In <1179595319.239229.262...@l77g2000hsb.googlegroups .com>,

bullockbefriending bard wrote:
first, regex part:
I am new to regexes and have come up with the following expression:
((1[0-4]|[1-9]),(1[0-4]|[1-9])/){5}(1[0-4]|[1-9]),(1[0-4]|[1-9])
to exactly match strings which look like this:
1,2/3,4/5,6/7,8/9,10/11,12
i.e. 6 comma-delimited pairs of integer numbers separated by the
backslash character + constraint that numbers must be in range 1-14.
i should add that i am only interested in finding exact matches (doing
some kind of command line validation).
[...]
the idea in the above code being that i want to use the regex match as
a test of whether or not the input string (results) is correctly
formatted. if the string results is not exactly matched by the regex,
i want my program to barf an exception and bail out. apart from
whether or not the regex is good idiom, is my approach suitably
pythonic?

I would use a simple regular expression to extract "candidates" and a
Python function to split the candidate and check for the extra
constraints. Especially the "all pairs different" constraint is something
I would not even attempt to put in a regex. For searching candidates this
should be good enough::

r'(\d+,\d+/){5}\d+,\d+'

Ciao,
Marc 'BlackJack' Rintsch

May 19 '07 #3
On 20/05/2007 3:21 AM, bullockbefriending bard wrote:
first, regex part:

I am new to regexes and have come up with the following expression:
((1[0-4]|[1-9]),(1[0-4]|[1-9])/){5}(1[0-4]|[1-9]),(1[0-4]|[1-9])

to exactly match strings which look like this:

1,2/3,4/5,6/7,8/9,10/11,12

i.e. 6 comma-delimited pairs of integer numbers separated by the
backslash character + constraint that numbers must be in range 1-14.
Backslash? Your example uses a [forward] slash.

Are you sure you don't want to allow for some spaces in the data, for
the benefit of the humans, e.g.
1,2 / 3,4 / 5,6 / 7,8 / 9,10 / 11,12
?
>
i should add that i am only interested in finding exact matches (doing
some kind of command line validation).

this seems to work fine, although i would welcome any advice about how
to shorten the above. it seems to me that there should exist some
shorthand for (1[0-4]|[1-9]) once i have defined it once?

also (and this is where my total beginner status brings me here
looking for help :)) i would like to add one more constraint to the
above regex. i want to match strings *iff* each pair of numbers are
different. e.g: 1,1/3,4/5,6/7,8/9,10/11,12 or
1,2/3,4/5,6/7,8/9,10/12,12 should fail to be matched by my final
regex whereas 1,2/3,4/5,6/7,8/9,10/11,12 should match OK.

any tips would be much appreciated - especially regarding preceding
paragraph!

and now for the python part:

results = "1,2/3,4/5,6/7,8/9,10/11,12"
match = re.match("((1[0-4]|[1-9]),(1[0-4]|[1-9])/){5}(1[0-4]|[1-9]),
(1[0-4]|[1-9])", results)
Always use "raw" strings for patterns, even if you don't have
backslashes in them -- and this one needs a backslash; see below.

For clarity, consider using "mobj" or even "m" instead of "match" to
name the result of re.match.

if match == None or match.group(0) != results:
Instead of
if mobj == None ....
use
if mobj is None ...
or
if not mobj ...

Instead of the "or match.group(0) != results" caper, put \Z (*not* $) at
the end of your pattern:
mobj = re.match(r"pattern\Z", results)
if not mobj:
HTH,
John
May 19 '07 #4
En Sat, 19 May 2007 19:40:39 -0300, bullockbefriending bard
<ki*******@gmail.comescribió:
from my cursory skimming of friedl, i get the feeling that the all
pairs different constraint would give rise to some kind of fairly
baroque expression, perhaps likely to bring to mind the following
quotation from samuel johnson:

"Sir, a woman's preaching is like a dog's walking on his hind legs.
It is not done well; but you are surprised to find it done at all."
Try this, it's not as hard, just using match and split (with the regular
expression propossed by MR):

import re
regex = re.compile(r'(\d+,\d+/){5}\d+,\d+')

def checkline(line):
if not regex.match(line):
raise ValueError("Invalid format: "+line)
for pair in line.split("/"):
a, b = pair.split(",")
if a==b:
raise ValueError("Duplicate number: "+line)

Here "all pairs different" means "for each pair, both numbers must be
different", but they may appear in another pair. That is, won't flag
"1,2/3,4/3,5/2,6/8,3/1,2" as invalid, but this wasn't clear from your
original post.

--
Gabriel Genellina

May 19 '07 #5
Backslash? Your example uses a [forward] slash.
correct.. my mistake. i use forward slashes.
Are you sure you don't want to allow for some spaces in the data, for
the benefit of the humans, e.g.
1,2 / 3,4 / 5,6 / 7,8 / 9,10 / 11,12
you are correct. however, i am using string as a command line option
and can get away without quoting it if there are no optional spaces.
Always use "raw" strings for patterns, even if you don't have
backslashes in them -- and this one needs a backslash; see below.
knew this, but had not done so in my code because wanted to use '\' as
a line continuation character to keep everything within 80 columns.
have adopted your advice regarding \Z below and now am using raw
string.
For clarity, consider using "mobj" or even "m" instead of "match" to
name the result of re.match.
good point.
if match == None or match.group(0) != results:

Instead of
if mobj == None ....
use
if mobj is None ...
or
if not mobj ...

Instead of the "or match.group(0) != results" caper, put \Z (*not* $) at
the end of your pattern:
mobj = re.match(r"pattern\Z", results)
if not mobj:

HTH,
John
very helpful advice. thanks!

May 19 '07 #6
Instead of the "or match.group(0) != results" caper, put \Z (*not* $) at
the end of your pattern:
mobj = re.match(r"pattern\Z", results)
if not mobj:
as the string i am matching against is coming from a command line
argument to a script, is there any reason why i cannot get away with
just $ given that this means that there is no way a newline could find
its way into my string? certainly passes all my unit tests as well as
\Z. or am i missing the point of \Z ?

May 20 '07 #7
Here "all pairs different" means "for each pair, both numbers must be
different", but they may appear in another pair. That is, won't flag
"1,2/3,4/3,5/2,6/8,3/1,2" as invalid, but this wasn't clear from your
original post.

--
Gabriel Genellina
thanks! you are correct that the 'all pairs different' nomenclature is
ambiguous. i require that each pair have different values, but is OK
for different pairs to be identical... so exactly as per your code
snippet.

May 20 '07 #8
On 20/05/2007 10:18 AM, bullockbefriending bard wrote:
>Instead of the "or match.group(0) != results" caper, put \Z (*not* $) at
the end of your pattern:
mobj = re.match(r"pattern\Z", results)
if not mobj:

as the string i am matching against is coming from a command line
argument to a script, is there any reason why i cannot get away with
just $ given that this means that there is no way a newline could find
its way into my string?
No way? Famous last words :-)

C:\junk>type showargs.py
import sys; print sys.argv

C:\junk>\python25\python
Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>import subprocess
subprocess.call('\\python25\\python showargs.py teehee\n')
['showargs.py', 'teehee\n']
0
>>>

certainly passes all my unit tests as well as
\Z. or am i missing the point of \Z ?
May 20 '07 #9
John Machin wrote:
On 20/05/2007 10:18 AM, bullockbefriending bard wrote:
>>Instead of the "or match.group(0) != results" caper, put \Z (*not* $) at
the end of your pattern:
mobj = re.match(r"pattern\Z", results)
if not mobj:
as the string i am matching against is coming from a command line
argument to a script, is there any reason why i cannot get away with
just $ given that this means that there is no way a newline could find
its way into my string?

No way? Famous last words :-)

C:\junk>type showargs.py
import sys; print sys.argv

C:\junk>\python25\python
Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>import subprocess
>>subprocess.call('\\python25\\python showargs.py teehee\n')
['showargs.py', 'teehee\n']
0
>>>


certainly passes all my unit tests as well as
>\Z. or am i missing the point of \Z ?
The simple shell command

python prog.py "argument containing
a newline"

would suffice to reject the "no newlines" hypothesis in Unix-like systems.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
------------------ Asciimercial ---------------------
Get on the web: Blog, lens and tag your way to fame!!
holdenweb.blogspot.com squidoo.com/pythonology
tagged items: del.icio.us/steve.holden/python
All these services currently offer free registration!
-------------- Thank You for Reading ----------------

May 20 '07 #10
>
No way? Famous last words :-)

C:\junk>type showargs.py
import sys; print sys.argv

C:\junk>\python25\python
Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>import subprocess
>>subprocess.call('\\python25\\python showargs.py teehee\n')
['showargs.py', 'teehee\n']
can't argue with that :) back to \Z

May 20 '07 #11

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

4 posts views Thread by aeuglein | last post: by
17 posts views Thread by clintonG | last post: by
5 posts views Thread by Kofi | last post: by
7 posts views Thread by CB | last post: by
6 posts views Thread by Martin Evans | last post: by
reply views Thread by Tidane | last post: by
4 posts views Thread by pedrito | last post: by
3 posts views Thread by Jeff | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.