473,414 Members | 1,684 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,414 software developers and data experts.

Regular Expressions in Python

Dear Gurus,

I am trying to find out how to write an effective regular expression
in python for the following scenario:

"any number of leading spaces at the beginning of a line" "follow
by a string" "there maybe a string that starts with *"

for example:

END *This is a comment

but I don't want to match this:

END e * This is a line with an error (e)

thanks,
Noel
Jul 18 '05 #1
3 1511
opt_spaces = " *"
identifier = "[A-Za-z_][A-Za-z0-9_]+"
comment = "\*.*"
opt_comment = "(%s)?" % comment

pat = re.compile(opt_spaces + identifier + opt_spaces + opt_comment + "$")

for test in (
" END *This is a comment",
" END e * This is a line with an error (e)"):
print test, pat.match(test)

Jeff

Jul 18 '05 #2
"Jeff Epler" <je****@unpythonic.net> wrote in message
news:ma*************************************@pytho n.org...
opt_spaces = " *"
identifier = "[A-Za-z_][A-Za-z0-9_]+"
comment = "\*.*"
opt_comment = "(%s)?" % comment

pat = re.compile(opt_spaces + identifier + opt_spaces + opt_comment + "$")

for test in (
" END *This is a comment",
" END e * This is a line with an error (e)"):
print test, pat.match(test)

Jeff

Assuming you're more interested in the identifier than in the comment,
change identifier to "([A-Za-z_][A-Za-z0-9_]+)" so that the keyword gets
saved in the pat.match.groups() list.

-- Paul
Jul 18 '05 #3
"fossil_blue" <no********@excite.com> wrote in message
news:c7**************************@posting.google.c om...
Dear Gurus,

I am trying to find out how to write an effective regular expression
in python for the following scenario:

"any number of leading spaces at the beginning of a line" "follow
by a string" "there maybe a string that starts with *"

for example:

END *This is a comment

but I don't want to match this:

END e * This is a line with an error (e)

thanks,
Noel


Here's an example with sample code using both re's and pyparsing. Note that
the single .ignore() call takes care of ignoring comments on all contained
grammar constructs, and non-significant whitespace is implicitly ignored (so
no need to litter your matching expressions with lots of opt_spaces-type
content).

-- Paul
========================
from pyparsing import Word, alphas, alphanums, restOfLine, LineEnd,
ParseException

testdata = """
END *This is a comment
END*This is a comment (but the next line has no comment)
END
END e * This is a line with an error (e)"""
enquote = lambda st : ( '"%s"' % st )

print "test with pyparsing"
grammar = Word( alphas, alphanums ).setName("keyword") + LineEnd()
comment = "*" + restOfLine
grammar.ignore( comment )

for test in testdata.split("\n"):
try:
print enquote(test),"\n->",
print grammar.parseString( test )
except ParseException, pe:
print pe

print

import re
print "test with re"
opt_spaces = " *"
#identifier = "[A-Za-z_][A-Za-z0-9_]+" - I'm guessing this regexp should
have ()'s for accessing content as a group
identifier = "([A-Za-z_][A-Za-z0-9_]+)"
comment = "\*.*"
opt_comment = "(%s)?" % comment

pat = re.compile(opt_spaces + identifier + opt_spaces + opt_comment + "$")

for test in testdata.split("\n"):
print enquote(test),"\n->",
if pat.match(test):
print pat.match(test).groups()
else:
print "Bad text"

========================
Gives this output:

test with pyparsing
""
-> Expected keyword (0), (1,1)
"END *This is a comment"
-> ['END']
" END*This is a comment (but the next line has no comment)"
-> ['END']
" END"
-> ['END']
" END e * This is a line with an error (e)"
-> Expected end of line (8), (1,9)

test with re
""
-> Bad text
"END *This is a comment"
-> ('END', '*This is a comment')
" END*This is a comment (but the next line has no comment)"
-> ('END', '*This is a comment (but the next line has no comment)')
" END"
-> ('END', None)
" END e * This is a line with an error (e)"
-> Bad text
Jul 18 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: Tony C | last post by:
I'm writing a python program which uses regular expressions, but I'm totally new to regexps. I've got Kuchling's "Regexp HOWTO", "Mastering Regular Expresions" by Oreilly, and have access to...
5
by: Fuzzyman | last post by:
I'm writing a song lyric database (effectively to drive a projector - so the database contains the full song lyrics). I'm using a nice simple Python database called KirbyBase which uses regular...
1
by: Kenneth McDonald | last post by:
I'm working on the 0.8 release of my 'rex' module, and would appreciate feedback, suggestions, and criticism as I work towards finalizing the API and feature sets. rex is a module intended to make...
19
by: Davy | last post by:
Hi all, I am a C/C++/Perl user and want to switch to Python (I found Python is more similar to C). Does Python support robust regular expression like Perl? And Python and Perl's File...
34
by: Antoine De Groote | last post by:
Hello, Can anybody tell me the reason(s) why regular expressions are not built into Python like it is the case with Ruby and I believe Perl? Like for example in the following Ruby code line =...
20
by: Geoff Hill | last post by:
What's the way to go about learning Python's regular expressions? I feel like such an idiot - being so strong in a programming language but knowing nothing about RE.
2
by: John Nagle | last post by:
Regular expressions are compiled in ASCII mode unless Unicode mode is specified to "rc.compile". The difference is that regular expressions in ASCII mode don't recognize things like Unicode...
13
by: Wiseman | last post by:
I'm kind of disappointed with the re regular expressions module. In particular, the lack of support for recursion ( (?R) or (?n) ) is a major drawback to me. There are so many great things that can...
2
by: bryan rasmussen | last post by:
Hi, I'm writing a program that requires specifically Unicode regular expressions http://unicode.org/reports/tr18/ to be loaded in from an external file and then interpreted against the data. if...
47
by: Henning_Thornblad | last post by:
What can be the cause of the large difference between re.search and grep? This script takes about 5 min to run on my computer: #!/usr/bin/env python import re row="" for a in range(156000):...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.