470,594 Members | 1,376 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 470,594 developers. It's quick & easy.

RegExp question

Hi,

I would like to form a regular expression to find a few different
tokens (and, or, xor) followed by some variable number of whitespace
(i.e., tabs and spaces) followed by a hash mark (i.e., #). What would
be the regular expression for this?

Thanks for any help,

Michael

Apr 11 '06 #1
10 1539
> I would like to form a regular expression to find a few
different tokens (and, or, xor) followed by some variable
number of whitespace (i.e., tabs and spaces) followed by
a hash mark (i.e., #). What would be the regular
expression for this?

(and|or|xor)\s*#

Unless "varible number of whitespace" means "at least *some*
whitespace", in which case you'd want to use

(and|or|xor)\s+#

Both are beautiful and precise.

-tim


Apr 11 '06 #2
Tim,

for some reason that does not seem to do the trick.

I am testing it with grep. (i.e., grep -e '(and|or|xor)\s*#' myfile)

Michael

Apr 11 '06 #3
"Michael McGarry" <mi*************@gmail.com> wrote in message
news:11**********************@t31g2000cwb.googlegr oups.com...
Hi,

I would like to form a regular expression to find a few different
tokens (and, or, xor) followed by some variable number of whitespace
(i.e., tabs and spaces) followed by a hash mark (i.e., #). What would
be the regular expression for this?

Thanks for any help,

Michael

Using pyparsing, whitespace is implicitly ignored. Your expression would
look like:

oneOf("and or xor") + Literal("#")
Here's a complete example:
from pyparsing import *

pattern = oneOf("and or xor") + Literal("#")

testString = """
z = (a and b) and #XVAL;
q = z xor #YVAL;
"""
# use scanString to locate matches
for tokens,start,end in pattern.scanString(testString):
print tokens[0], tokens.asList()
print line(start,testString)
print (" "*(col(start,testString)-1)) + "^"
print
print
# use transformString to locate matches and substitute values
subs = {
'XVAL': 0,
'YVAL': True,
}
def replaceSubs(st,loc,toks):
try:
return toks[0] + " " + str(subs[toks[2]])
except KeyError:
pass

pattern2 = (pattern + Word(alphanums)).setParseAction(replaceSubs)
print pattern2.transformString(testString)

-----------------
Prints:
and ['and', '#']
z = (a and b) and #XVAL;
^

xor ['xor', '#']
q = z xor #YVAL;
^
z = (a and b) and 0;
q = z xor True;
Download pyparsing at http://pyparsing.sourceforge.net.

-- Paul

Apr 11 '06 #4
Am Dienstag 11 April 2006 21:16 schrieb Michael McGarry:
I am testing it with grep. (i.e., grep -e '(and|or|xor)\s*#' myfile)


Test it with Python's re-module, then. \s for matching Whitespace is specific
to Python (AFAIK). And as you've asked in a Python Newsgroup, you'll get
Python-answers here.

--- Heiko.
Apr 11 '06 #5
In my opinion you would be best to use a tool like Kiki.
http://project5.freezope.org/kiki/index.html/#

This will allow you to paste in the actual text you want to search and
then play with different RE's and set flags with a simple mouse click
so you can find just what you want. Rember what re.DOTALL does. It
will treat white spaces special and if there are line breaks it will
follow them, otherwise it will not. It's a good idea to have a grasp
of regular expressions or when you come back to your code months /
weeks later, you will be just as lost, and always comment them very
well :).

Just my 2˘

Apr 11 '06 #6
On 2006-04-11, Michael McGarry <mi*************@gmail.com> wrote:
Hi,

I would like to form a regular expression to find a few different
tokens (and, or, xor) followed by some variable number of whitespace
(i.e., tabs and spaces) followed by a hash mark (i.e., #). What would
be the regular expression for this?


re.compile(r'(?:and|or|xor)\s*#')
Apr 11 '06 #7
> I am testing it with grep. (i.e., grep -e '(and|or|xor)\s*#' myfile)

Well, you asked for the python regexp...different
environments use different regexp parsing engines. Your
response is akin to saying "the example snippet of python
code you gave me doesn't work in my Pascal program".

For grep:

grep '\(and\|or\|xor\)[[:space:]]*#' myfile

For Vim:

:g/\(and\|or\|xor\)\s*#/

The one I gave originally is a python regexp, and thus
should be tested within python, not grep or vim or emacs or
sed or whatever.

It's always best to test in the real
environment...otherwise, you'll get flakey results.

-tkc


Apr 11 '06 #8
(-:
Sorry about Tim. He's not very imaginative. He presumed that because
you asked on comp.lang.python that you would be testing it with Python.
You should have either (a) asked your question on
comp.toolswithfunnynames.grep or (b) not presumed that grep's re syntax
is the same as Python's.
:-)

My grep appears to need something fugly like this:

grep -e "\(and\|or\|xor\)[ \t]*#" grepre.txt

but my grep is a Windows port which identifies itself as "grep (GNU
grep) 2.5.1" so it's definitely not The One True Grep ...

Now that you're here, why don't you try Python? It's not hard, e.g.

#>>> import re
#>>> rs = re.compile(r"(and|or|xor)\s*#").search
#>>> rs("if foo and #continued")
#<_sre.SRE_Match object at 0x00AE66E0>
#>>> rs("if foo and#continued")
#<_sre.SRE_Match object at 0x00AE6620>
#>>> rs("if foo and bar #continued")
#>>> rs("if foo xor # continued")
#<_sre.SRE_Match object at 0x00AE66E0>
#>>>

HTH,
John

Apr 11 '06 #9
On 2006-04-11, Michael McGarry <mi*************@gmail.com> wrote:
Tim,

for some reason that does not seem to do the trick.

I am testing it with grep. (i.e., grep -e '(and|or|xor)\s*#' myfile)


Try with grep -P, which means use perl-compatible regexes as opposed to
POSIX ones. I only know for sure that -P exists for GNU grep.

I assumed it was a Python question! Unless you're testing your Python
regex with grep, not realizing they're different.

Perl and Python regexes are (mostly?) the same.

I usually grep -P because I know Python regexes better than any other
ones.
Apr 11 '06 #10
Precise? The OP asked for "tokens".

#>>> re.search(r"(and|or|xor)\s*#", "a = the_operand # gotcha!")
#<_sre.SRE_Match object at 0x00AE6620>

Try this:

#>>> re.search(r"\b(and|or|xor)\s*#", "a = the_operand # should fail")
#>>> re.search(r"\b(and|or|xor)\s*#", "and # OK")
#<_sre.SRE_Match object at 0x00AE6E60>
#>>> re.search(r"\b(and|or|xor)\s*#", "blah blah and # OK")
#<_sre.SRE_Match object at 0x00AE66E0>

Apr 11 '06 #11

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by python_charmer2000 | last post: by
19 posts views Thread by Magnus Lie Hetland | last post: by
5 posts views Thread by Lukas Holcik | last post: by
3 posts views Thread by Sped Erstad | last post: by
2 posts views Thread by Bill McCormick | last post: by
26 posts views Thread by Matt Kruse | last post: by
7 posts views Thread by Csaba Gabor | last post: by
11 posts views Thread by HopfZ | last post: by
8 posts views Thread by Darryl Kerkeslager | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.