473,549 Members | 2,791 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Hopefully simple regular expression question

I want to match a word against a string such that 'peter' is found in
"peter bengtsson" or " hey peter," or but in "thepeter bengtsson" or
"hey peterbe," because the word has to stand on its own. The following
code works for a single word:

def createStandalon eWordRegex(word ):
""" return a regular expression that can find 'peter' only if it's
written
alone (next to space, start of string, end of string, comma, etc)
but
not if inside another word like peterbe """
return re.compile(r"""
(
^ %s
(?=\W | $)
|
(?<=\W)
%s
(?=\W | $)
)
"""% (word, word), re.I|re.L|re.M| re.X)
def test_createStan daloneWordRegex ():
def T(word, text):
print createStandalon eWordRegex(word ).findall(text)

T("peter", "So Peter Bengtsson wrote this")
T("peter", "peter")
T("peter bengtsson", "So Peter Bengtsson wrote this")

The result of running this is::

['Peter']
['peter']
[] <--- this is the problem!!
It works if the parameter is just one word (eg. 'peter') but stops
working when it's an expression (eg. 'peter bengtsson')

How do I modify my regular expression to match on expressions as well
as just single words??

Jul 19 '05 #1
4 1596
pe*****@gmail.c om wrote:
I want to match a word against a string such that 'peter' is found in
"peter bengtsson" or " hey peter," or but in "thepeter bengtsson" or
"hey peterbe," because the word has to stand on its own. The following
code works for a single word:

def createStandalon eWordRegex(word ):
""" return a regular expression that can find 'peter' only if it's
written
alone (next to space, start of string, end of string, comma, etc)
but
not if inside another word like peterbe """
return re.compile(r"""
(
^ %s
(?=\W | $)
|
(?<=\W)
%s
(?=\W | $)
)
"""% (word, word), re.I|re.L|re.M| re.X)
def test_createStan daloneWordRegex ():
def T(word, text):
print createStandalon eWordRegex(word ).findall(text)

T("peter", "So Peter Bengtsson wrote this")
T("peter", "peter")
T("peter bengtsson", "So Peter Bengtsson wrote this")

The result of running this is::

['Peter']
['peter']
[] <--- this is the problem!!
It works if the parameter is just one word (eg. 'peter') but stops
working when it's an expression (eg. 'peter bengtsson')
No, not when it's an "expression " (whatever that means), but when the
parameter contains whitespace, which is ignored in verbose mode.

How do I modify my regular expression to match on expressions as well
as just single words??


If you must stick with re.X, you must escape any whitespace characters
in your "word" -- see re.escape().

Alternatively (1), drop re.X but this is ugly:

regex_text_no_X = r"(^%s(?=\W|$)| (?<=\W)%s(?=\W| $))" % (word, word)

Alternatively (2), consider using the \b gadget; this appears to give
the same answers as the baroque method:

regex_text_no_f lab = r"\b%s\b" % word
HTH,
John

Jul 19 '05 #2
On Tue, 14 Jun 2005 13:01:58 +0200, pe*****@gmail.c om wrote
(in article <11************ ********@g49g20 00cwa.googlegro ups.com>):
How do I modify my regular expression to match on expressions as well
as just single words??


import re

def createStandalon eWordRegex(word ):
""" return a regular expression that can find 'peter' only if it's
written alone (next to space, start of string, end of string,
comma, etc) but not if inside another word like peterbe """

return re.compile(r'\b ' + word + r'\b', re.I)
def test_createStan daloneWordRegex ():
def T(word, text):
print createStandalon eWordRegex(word ).findall(text)

T("peter", "So Peter Bengtsson wrote this")
T("peter", "peter")
T("peter bengtsson", "So Peter Bengtsson wrote this")
test_createStan daloneWordRegex ()

Works?

Jul 19 '05 #3
On 14 Jun 2005 04:01:58 -0700, rumours say that "pe*****@gmail. com"
<pe*****@gmail. com> might have written:
I want to match a word against a string such that 'peter' is found in
"peter bengtsson" or " hey peter," or but in "thepeter bengtsson" or
"hey peterbe," because the word has to stand on its own. The following
code works for a single word:


[snip]

use \b before and after the word you search, for example:

rePeter= re.compile("\bp eter\b", re.I)

In the documentation for the re module, Subsection 4.2.1 is Regular
Expression Syntax; it'll help a lot if you read it.

Cheers.
--
TZOTZIOY, I speak England very best.
"Be strict when sending and tolerant when receiving." (from RFC1958)
I really should keep that in mind when talking with people, actually...
Jul 19 '05 #4
Thank you! I had totally forgot about that. It works.

Jul 19 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
2242
by: Reckless | last post by:
I've got a file with this in it: The data I'd like extracted is within the quotes: Some string data I can read the file out and extract (using string positions) the data I'd like but it would neater if I use a regular expression. Only problem is I've never seen a working example of this type of extraction and am completely new to PHP.
10
3005
by: Lee Kuhn | last post by:
I am trying the create a regular expression that will essentially match characters in the middle of a fixed-length string. The string may be any characters, but will always be the same length. In other words, as the regular expression (....)($) matches the "4567" in the string "1234567", how would I create a similar regular expression that...
18
3019
by: Q. John Chen | last post by:
I have Vidation Controls First One: Simple exluce certain special characters: say no a or b or c in the string: * Second One: I required date be entered in "MM/DD/YYYY" format: //+4 How ??
20
2540
by: Larry Woods | last post by:
I'm drawing a blank... What is the regular expression for search a string for NO occurances of a substring? Example: I want to find all lines that do NOT have the substing "image" in them. TIA, Larry Woods
5
3092
by: Ryan | last post by:
HELLO I am using the following MICROSOFT SUGGESTED (somewhere on msdn) regular expression to validate email addresses however I understand that the RFP allows for "+" symbols in the email address and this method does not.... Does anyone have an explanation? Function IsValidEmail(ByVal strIn As String) As Boolean
6
1495
by: alexrussell101 | last post by:
For anyone who can't be bothered to read my code and examples, scroll to the bottom, the question's there. Thanks. I'm using php and regular expressions to convert bbcode style things to html. My code to convert something like this: Hello there
7
3801
by: Billa | last post by:
Hi, I am replaceing a big string using different regular expressions (see some example at the end of the message). The problem is whenever I apply a "replace" it makes a new copy of string and I want to avoid that. My question here is if there is a way to pass either a memory stream or array of "find", "replace" expressions or any other way...
6
2280
by: Ludwig | last post by:
Hi, i'm using the regular expression \b\w to find the beginning of a word, in my C# application. If the word is 'public', for example, it works. However, if the word is '<public', it does not work: it seems that < is not a valid character, so the beginning of the word starts at theletter 'p' instead of '<'. Because I'm not an expert in...
25
5137
by: Mike | last post by:
I have a regular expression (^(.+)(?=\s*).*\1 ) that results in matches. I would like to get what the actual regular expression is. In other words, when I apply ^(.+)(?=\s*).*\1 to " HEART (CONDUCTION DEFECT) 37.33/2 HEART (CONDUCTION DEFECT) WITH CATHETER 37.34/2 " the expression is "HEART (CONDUCTION DEFECT)". How do I gain access...
0
7480
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
7751
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
0
7992
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
0
7840
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
1
5396
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
5119
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3525
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
1
1087
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
793
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.