472,119 Members | 953 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,119 software developers and data experts.

Hopefully simple regular expression question

I want to match a word against a string such that 'peter' is found in
"peter bengtsson" or " hey peter," or but in "thepeter bengtsson" or
"hey peterbe," because the word has to stand on its own. The following
code works for a single word:

def createStandaloneWordRegex(word):
""" return a regular expression that can find 'peter' only if it's
written
alone (next to space, start of string, end of string, comma, etc)
but
not if inside another word like peterbe """
return re.compile(r"""
(
^ %s
(?=\W | $)
|
(?<=\W)
%s
(?=\W | $)
)
"""% (word, word), re.I|re.L|re.M|re.X)
def test_createStandaloneWordRegex():
def T(word, text):
print createStandaloneWordRegex(word).findall(text)

T("peter", "So Peter Bengtsson wrote this")
T("peter", "peter")
T("peter bengtsson", "So Peter Bengtsson wrote this")

The result of running this is::

['Peter']
['peter']
[] <--- this is the problem!!
It works if the parameter is just one word (eg. 'peter') but stops
working when it's an expression (eg. 'peter bengtsson')

How do I modify my regular expression to match on expressions as well
as just single words??

Jul 19 '05 #1
4 1540
pe*****@gmail.com wrote:
I want to match a word against a string such that 'peter' is found in
"peter bengtsson" or " hey peter," or but in "thepeter bengtsson" or
"hey peterbe," because the word has to stand on its own. The following
code works for a single word:

def createStandaloneWordRegex(word):
""" return a regular expression that can find 'peter' only if it's
written
alone (next to space, start of string, end of string, comma, etc)
but
not if inside another word like peterbe """
return re.compile(r"""
(
^ %s
(?=\W | $)
|
(?<=\W)
%s
(?=\W | $)
)
"""% (word, word), re.I|re.L|re.M|re.X)
def test_createStandaloneWordRegex():
def T(word, text):
print createStandaloneWordRegex(word).findall(text)

T("peter", "So Peter Bengtsson wrote this")
T("peter", "peter")
T("peter bengtsson", "So Peter Bengtsson wrote this")

The result of running this is::

['Peter']
['peter']
[] <--- this is the problem!!
It works if the parameter is just one word (eg. 'peter') but stops
working when it's an expression (eg. 'peter bengtsson')
No, not when it's an "expression" (whatever that means), but when the
parameter contains whitespace, which is ignored in verbose mode.

How do I modify my regular expression to match on expressions as well
as just single words??


If you must stick with re.X, you must escape any whitespace characters
in your "word" -- see re.escape().

Alternatively (1), drop re.X but this is ugly:

regex_text_no_X = r"(^%s(?=\W|$)|(?<=\W)%s(?=\W|$))" % (word, word)

Alternatively (2), consider using the \b gadget; this appears to give
the same answers as the baroque method:

regex_text_no_flab = r"\b%s\b" % word
HTH,
John

Jul 19 '05 #2
On Tue, 14 Jun 2005 13:01:58 +0200, pe*****@gmail.com wrote
(in article <11********************@g49g2000cwa.googlegroups.c om>):
How do I modify my regular expression to match on expressions as well
as just single words??


import re

def createStandaloneWordRegex(word):
""" return a regular expression that can find 'peter' only if it's
written alone (next to space, start of string, end of string,
comma, etc) but not if inside another word like peterbe """

return re.compile(r'\b' + word + r'\b', re.I)
def test_createStandaloneWordRegex():
def T(word, text):
print createStandaloneWordRegex(word).findall(text)

T("peter", "So Peter Bengtsson wrote this")
T("peter", "peter")
T("peter bengtsson", "So Peter Bengtsson wrote this")
test_createStandaloneWordRegex()

Works?

Jul 19 '05 #3
On 14 Jun 2005 04:01:58 -0700, rumours say that "pe*****@gmail.com"
<pe*****@gmail.com> might have written:
I want to match a word against a string such that 'peter' is found in
"peter bengtsson" or " hey peter," or but in "thepeter bengtsson" or
"hey peterbe," because the word has to stand on its own. The following
code works for a single word:


[snip]

use \b before and after the word you search, for example:

rePeter= re.compile("\bpeter\b", re.I)

In the documentation for the re module, Subsection 4.2.1 is Regular
Expression Syntax; it'll help a lot if you read it.

Cheers.
--
TZOTZIOY, I speak England very best.
"Be strict when sending and tolerant when receiving." (from RFC1958)
I really should keep that in mind when talking with people, actually...
Jul 19 '05 #4
Thank you! I had totally forgot about that. It works.

Jul 19 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

7 posts views Thread by Reckless | last post: by
10 posts views Thread by Lee Kuhn | last post: by
18 posts views Thread by Q. John Chen | last post: by
20 posts views Thread by Larry Woods | last post: by
6 posts views Thread by alexrussell101 | last post: by
7 posts views Thread by Billa | last post: by
6 posts views Thread by Ludwig | last post: by
25 posts views Thread by Mike | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.