473,395 Members | 1,473 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

Hopefully simple regular expression question

I want to match a word against a string such that 'peter' is found in
"peter bengtsson" or " hey peter," or but in "thepeter bengtsson" or
"hey peterbe," because the word has to stand on its own. The following
code works for a single word:

def createStandaloneWordRegex(word):
""" return a regular expression that can find 'peter' only if it's
written
alone (next to space, start of string, end of string, comma, etc)
but
not if inside another word like peterbe """
return re.compile(r"""
(
^ %s
(?=\W | $)
|
(?<=\W)
%s
(?=\W | $)
)
"""% (word, word), re.I|re.L|re.M|re.X)
def test_createStandaloneWordRegex():
def T(word, text):
print createStandaloneWordRegex(word).findall(text)

T("peter", "So Peter Bengtsson wrote this")
T("peter", "peter")
T("peter bengtsson", "So Peter Bengtsson wrote this")

The result of running this is::

['Peter']
['peter']
[] <--- this is the problem!!
It works if the parameter is just one word (eg. 'peter') but stops
working when it's an expression (eg. 'peter bengtsson')

How do I modify my regular expression to match on expressions as well
as just single words??

Jul 19 '05 #1
4 1582
pe*****@gmail.com wrote:
I want to match a word against a string such that 'peter' is found in
"peter bengtsson" or " hey peter," or but in "thepeter bengtsson" or
"hey peterbe," because the word has to stand on its own. The following
code works for a single word:

def createStandaloneWordRegex(word):
""" return a regular expression that can find 'peter' only if it's
written
alone (next to space, start of string, end of string, comma, etc)
but
not if inside another word like peterbe """
return re.compile(r"""
(
^ %s
(?=\W | $)
|
(?<=\W)
%s
(?=\W | $)
)
"""% (word, word), re.I|re.L|re.M|re.X)
def test_createStandaloneWordRegex():
def T(word, text):
print createStandaloneWordRegex(word).findall(text)

T("peter", "So Peter Bengtsson wrote this")
T("peter", "peter")
T("peter bengtsson", "So Peter Bengtsson wrote this")

The result of running this is::

['Peter']
['peter']
[] <--- this is the problem!!
It works if the parameter is just one word (eg. 'peter') but stops
working when it's an expression (eg. 'peter bengtsson')
No, not when it's an "expression" (whatever that means), but when the
parameter contains whitespace, which is ignored in verbose mode.

How do I modify my regular expression to match on expressions as well
as just single words??


If you must stick with re.X, you must escape any whitespace characters
in your "word" -- see re.escape().

Alternatively (1), drop re.X but this is ugly:

regex_text_no_X = r"(^%s(?=\W|$)|(?<=\W)%s(?=\W|$))" % (word, word)

Alternatively (2), consider using the \b gadget; this appears to give
the same answers as the baroque method:

regex_text_no_flab = r"\b%s\b" % word
HTH,
John

Jul 19 '05 #2
On Tue, 14 Jun 2005 13:01:58 +0200, pe*****@gmail.com wrote
(in article <11********************@g49g2000cwa.googlegroups.c om>):
How do I modify my regular expression to match on expressions as well
as just single words??


import re

def createStandaloneWordRegex(word):
""" return a regular expression that can find 'peter' only if it's
written alone (next to space, start of string, end of string,
comma, etc) but not if inside another word like peterbe """

return re.compile(r'\b' + word + r'\b', re.I)
def test_createStandaloneWordRegex():
def T(word, text):
print createStandaloneWordRegex(word).findall(text)

T("peter", "So Peter Bengtsson wrote this")
T("peter", "peter")
T("peter bengtsson", "So Peter Bengtsson wrote this")
test_createStandaloneWordRegex()

Works?

Jul 19 '05 #3
On 14 Jun 2005 04:01:58 -0700, rumours say that "pe*****@gmail.com"
<pe*****@gmail.com> might have written:
I want to match a word against a string such that 'peter' is found in
"peter bengtsson" or " hey peter," or but in "thepeter bengtsson" or
"hey peterbe," because the word has to stand on its own. The following
code works for a single word:


[snip]

use \b before and after the word you search, for example:

rePeter= re.compile("\bpeter\b", re.I)

In the documentation for the re module, Subsection 4.2.1 is Regular
Expression Syntax; it'll help a lot if you read it.

Cheers.
--
TZOTZIOY, I speak England very best.
"Be strict when sending and tolerant when receiving." (from RFC1958)
I really should keep that in mind when talking with people, actually...
Jul 19 '05 #4
Thank you! I had totally forgot about that. It works.

Jul 19 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: Reckless | last post by:
I've got a file with this in it: The data I'd like extracted is within the quotes: Some string data I can read the file out and extract (using string positions) the data I'd like but it...
10
by: Lee Kuhn | last post by:
I am trying the create a regular expression that will essentially match characters in the middle of a fixed-length string. The string may be any characters, but will always be the same length. In...
18
by: Q. John Chen | last post by:
I have Vidation Controls First One: Simple exluce certain special characters: say no a or b or c in the string: * Second One: I required date be entered in "MM/DD/YYYY" format: //+4 How...
20
by: Larry Woods | last post by:
I'm drawing a blank... What is the regular expression for search a string for NO occurances of a substring? Example: I want to find all lines that do NOT have the substing "image" in them. ...
5
by: Ryan | last post by:
HELLO I am using the following MICROSOFT SUGGESTED (somewhere on msdn) regular expression to validate email addresses however I understand that the RFP allows for "+" symbols in the email address...
6
by: alexrussell101 | last post by:
For anyone who can't be bothered to read my code and examples, scroll to the bottom, the question's there. Thanks. I'm using php and regular expressions to convert bbcode style things to html....
7
by: Billa | last post by:
Hi, I am replaceing a big string using different regular expressions (see some example at the end of the message). The problem is whenever I apply a "replace" it makes a new copy of string and I...
6
by: Ludwig | last post by:
Hi, i'm using the regular expression \b\w to find the beginning of a word, in my C# application. If the word is 'public', for example, it works. However, if the word is '<public', it does not...
25
by: Mike | last post by:
I have a regular expression (^(.+)(?=\s*).*\1 ) that results in matches. I would like to get what the actual regular expression is. In other words, when I apply ^(.+)(?=\s*).*\1 to " HEART...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.