473,473 Members | 2,036 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

RegExp question

Hi,

I would like to form a regular expression to find a few different
tokens (and, or, xor) followed by some variable number of whitespace
(i.e., tabs and spaces) followed by a hash mark (i.e., #). What would
be the regular expression for this?

Thanks for any help,

Michael

Apr 11 '06 #1
10 1595
> I would like to form a regular expression to find a few
different tokens (and, or, xor) followed by some variable
number of whitespace (i.e., tabs and spaces) followed by
a hash mark (i.e., #). What would be the regular
expression for this?

(and|or|xor)\s*#

Unless "varible number of whitespace" means "at least *some*
whitespace", in which case you'd want to use

(and|or|xor)\s+#

Both are beautiful and precise.

-tim


Apr 11 '06 #2
Tim,

for some reason that does not seem to do the trick.

I am testing it with grep. (i.e., grep -e '(and|or|xor)\s*#' myfile)

Michael

Apr 11 '06 #3
"Michael McGarry" <mi*************@gmail.com> wrote in message
news:11**********************@t31g2000cwb.googlegr oups.com...
Hi,

I would like to form a regular expression to find a few different
tokens (and, or, xor) followed by some variable number of whitespace
(i.e., tabs and spaces) followed by a hash mark (i.e., #). What would
be the regular expression for this?

Thanks for any help,

Michael

Using pyparsing, whitespace is implicitly ignored. Your expression would
look like:

oneOf("and or xor") + Literal("#")
Here's a complete example:
from pyparsing import *

pattern = oneOf("and or xor") + Literal("#")

testString = """
z = (a and b) and #XVAL;
q = z xor #YVAL;
"""
# use scanString to locate matches
for tokens,start,end in pattern.scanString(testString):
print tokens[0], tokens.asList()
print line(start,testString)
print (" "*(col(start,testString)-1)) + "^"
print
print
# use transformString to locate matches and substitute values
subs = {
'XVAL': 0,
'YVAL': True,
}
def replaceSubs(st,loc,toks):
try:
return toks[0] + " " + str(subs[toks[2]])
except KeyError:
pass

pattern2 = (pattern + Word(alphanums)).setParseAction(replaceSubs)
print pattern2.transformString(testString)

-----------------
Prints:
and ['and', '#']
z = (a and b) and #XVAL;
^

xor ['xor', '#']
q = z xor #YVAL;
^
z = (a and b) and 0;
q = z xor True;
Download pyparsing at http://pyparsing.sourceforge.net.

-- Paul

Apr 11 '06 #4
Am Dienstag 11 April 2006 21:16 schrieb Michael McGarry:
I am testing it with grep. (i.e., grep -e '(and|or|xor)\s*#' myfile)


Test it with Python's re-module, then. \s for matching Whitespace is specific
to Python (AFAIK). And as you've asked in a Python Newsgroup, you'll get
Python-answers here.

--- Heiko.
Apr 11 '06 #5
In my opinion you would be best to use a tool like Kiki.
http://project5.freezope.org/kiki/index.html/#

This will allow you to paste in the actual text you want to search and
then play with different RE's and set flags with a simple mouse click
so you can find just what you want. Rember what re.DOTALL does. It
will treat white spaces special and if there are line breaks it will
follow them, otherwise it will not. It's a good idea to have a grasp
of regular expressions or when you come back to your code months /
weeks later, you will be just as lost, and always comment them very
well :).

Just my 2¢

Apr 11 '06 #6
On 2006-04-11, Michael McGarry <mi*************@gmail.com> wrote:
Hi,

I would like to form a regular expression to find a few different
tokens (and, or, xor) followed by some variable number of whitespace
(i.e., tabs and spaces) followed by a hash mark (i.e., #). What would
be the regular expression for this?


re.compile(r'(?:and|or|xor)\s*#')
Apr 11 '06 #7
> I am testing it with grep. (i.e., grep -e '(and|or|xor)\s*#' myfile)

Well, you asked for the python regexp...different
environments use different regexp parsing engines. Your
response is akin to saying "the example snippet of python
code you gave me doesn't work in my Pascal program".

For grep:

grep '\(and\|or\|xor\)[[:space:]]*#' myfile

For Vim:

:g/\(and\|or\|xor\)\s*#/

The one I gave originally is a python regexp, and thus
should be tested within python, not grep or vim or emacs or
sed or whatever.

It's always best to test in the real
environment...otherwise, you'll get flakey results.

-tkc


Apr 11 '06 #8
(-:
Sorry about Tim. He's not very imaginative. He presumed that because
you asked on comp.lang.python that you would be testing it with Python.
You should have either (a) asked your question on
comp.toolswithfunnynames.grep or (b) not presumed that grep's re syntax
is the same as Python's.
:-)

My grep appears to need something fugly like this:

grep -e "\(and\|or\|xor\)[ \t]*#" grepre.txt

but my grep is a Windows port which identifies itself as "grep (GNU
grep) 2.5.1" so it's definitely not The One True Grep ...

Now that you're here, why don't you try Python? It's not hard, e.g.

#>>> import re
#>>> rs = re.compile(r"(and|or|xor)\s*#").search
#>>> rs("if foo and #continued")
#<_sre.SRE_Match object at 0x00AE66E0>
#>>> rs("if foo and#continued")
#<_sre.SRE_Match object at 0x00AE6620>
#>>> rs("if foo and bar #continued")
#>>> rs("if foo xor # continued")
#<_sre.SRE_Match object at 0x00AE66E0>
#>>>

HTH,
John

Apr 11 '06 #9
On 2006-04-11, Michael McGarry <mi*************@gmail.com> wrote:
Tim,

for some reason that does not seem to do the trick.

I am testing it with grep. (i.e., grep -e '(and|or|xor)\s*#' myfile)


Try with grep -P, which means use perl-compatible regexes as opposed to
POSIX ones. I only know for sure that -P exists for GNU grep.

I assumed it was a Python question! Unless you're testing your Python
regex with grep, not realizing they're different.

Perl and Python regexes are (mostly?) the same.

I usually grep -P because I know Python regexes better than any other
ones.
Apr 11 '06 #10
Precise? The OP asked for "tokens".

#>>> re.search(r"(and|or|xor)\s*#", "a = the_operand # gotcha!")
#<_sre.SRE_Match object at 0x00AE6620>

Try this:

#>>> re.search(r"\b(and|or|xor)\s*#", "a = the_operand # should fail")
#>>> re.search(r"\b(and|or|xor)\s*#", "and # OK")
#<_sre.SRE_Match object at 0x00AE6E60>
#>>> re.search(r"\b(and|or|xor)\s*#", "blah blah and # OK")
#<_sre.SRE_Match object at 0x00AE66E0>

Apr 11 '06 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: python_charmer2000 | last post by:
I want to match several regexps against a large body of text. What I have so far is similar to this: re1 = <some regexp> re2 = <some regexp> re3 = <some regexp> big_re = re.compile(re1 +...
19
by: Magnus Lie Hetland | last post by:
I'm working on a project (Atox) where I need to match quite a few regular expressions (several hundred) in reasonably large text files. I've found that this can easily get rather slow. (There are...
5
by: Lukas Holcik | last post by:
Hi everyone! How can I simply search text for regexps (lets say <a href="(.*?)">(.*?)</a>) and save all URLs(1) and link contents(2) in a dictionary { name : URL}? In a single pass if it could....
4
by: Jon Maz | last post by:
Hi All, I want to strip the accents off characters in a string so that, for example, the (Spanish) word "práctico" comes out as "practico" - but ignoring case, so that "PRÁCTICO" comes out as...
3
by: Sped Erstad | last post by:
There must be a simple regexp reason for this little question but it's driving me nuts. Below is a simple regexp to determine if a string contains only numbers. I'm running these two strings...
2
by: Bill McCormick | last post by:
Hello, I'm new to VB.NET but have used regexp in Perl and VI. I'd like to read a regular expression from a file and apply it to a string read from another file. The regexp is simple word...
26
by: Matt Kruse | last post by:
Are there any current browsers that have Javascript support, but not RegExp support? For example, cell phone browsers, blackberrys, or other "minimal" browsers? I know that someone using Netscape...
7
by: Csaba Gabor | last post by:
I need to come up with a function function regExpPos (text, re, parenNum) { ... } that will return the position within text of RegExp.$parenNum if there is a match, and -1 otherwise. For...
11
by: HopfZ | last post by:
I coudn't understand some behavior of RegExp.test function. Example html code: ---------------- <html><head></head><body><script type="text/javascript"> var r = /^https?:\/\//g;...
8
by: Darryl Kerkeslager | last post by:
Currently I am using the RegExp object to parse a large dataset in an Access table - but this table was exported from SQL Server, and the very correct question was asked - why not just do it in SQL...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
1
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.