473,288 Members | 1,794 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,288 software developers and data experts.

Re: re

Actually using regular expressions for the first
time. Is there something that allows you to take the
union of two character sets, or append a character to
a character set?

Say I want to replace 'disc' with 'disk', but only
when 'disc' is a complete word (don't want to change
'discuss' to 'diskuss'.) The following seems almost
right:

[^a-zA-Z])disc[^a-zA-Z]

The problem is that that doesn't match if 'disc' is at
the start or end of the string. Of course I could just
combine a few re's with |, but it seems like there should
(or might?) be a way to simply append a \A to the first
[^a-zA-Z] and a \Z to the second.

--
David C. Ullrich
Jun 27 '08 #1
6 905
David C. Ullrich schrieb:
Actually using regular expressions for the first
time. Is there something that allows you to take the
union of two character sets, or append a character to
a character set?

Say I want to replace 'disc' with 'disk', but only
when 'disc' is a complete word (don't want to change
'discuss' to 'diskuss'.) The following seems almost
right:

[^a-zA-Z])disc[^a-zA-Z]

The problem is that that doesn't match if 'disc' is at
the start or end of the string. Of course I could just
combine a few re's with |, but it seems like there should
(or might?) be a way to simply append a \A to the first
[^a-zA-Z] and a \Z to the second.
Why not

($|[\w])disc(^|[^\w])

I hope \w is really the literal for whitespace - might be something
different, see the docs.

Diez
Jun 27 '08 #2
"Diez B. Roggisch" <de***@nospam.web.dewrote in message
news:6a*************@mid.uni-berlin.de...
David C. Ullrich schrieb:
>Say I want to replace 'disc' with 'disk', but only
when 'disc' is a complete word (don't want to change
'discuss' to 'diskuss'.) The following seems almost
right:

[^a-zA-Z])disc[^a-zA-Z]

The problem is that that doesn't match if 'disc' is at
the start or end of the string. Of course I could just
combine a few re's with |, but it seems like there should
(or might?) be a way to simply append a \A to the first
[^a-zA-Z] and a \Z to the second.

Why not

($|[\w])disc(^|[^\w])

I hope \w is really the literal for whitespace - might be something
different, see the docs.
No, \s is the literal for whitespace.
http://www.python.org/doc/current/lib/re-syntax.html

But how about:

text = re.sub(r"\bdisc\b", "disk", text_to_be_changed)

\b is the "word break" character, it matches at the beginning or end of any
"word" (where a word is any sequence of \w characters, and \w is any
alphanumeric
character or _).

Note that this solution still doesn't catch "Disc" if it is capitalized.

Russ

Jun 27 '08 #3
In article <6a*************@mid.uni-berlin.de>,
"Diez B. Roggisch" <de***@nospam.web.dewrote:
David C. Ullrich schrieb:
Actually using regular expressions for the first
time. Is there something that allows you to take the
union of two character sets, or append a character to
a character set?

Say I want to replace 'disc' with 'disk', but only
when 'disc' is a complete word (don't want to change
'discuss' to 'diskuss'.) The following seems almost
right:

[^a-zA-Z])disc[^a-zA-Z]

The problem is that that doesn't match if 'disc' is at
the start or end of the string. Of course I could just
combine a few re's with |, but it seems like there should
(or might?) be a way to simply append a \A to the first
[^a-zA-Z] and a \Z to the second.

Why not

($|[\w])disc(^|[^\w])

I hope \w is really the literal for whitespace - might be something
different, see the docs.
Thanks, but I don't follow that at all.

Whitespace is actually \s. But [\s]disc[whatever]
doesn't do the job - then it won't match "(disc)",
which counts as "disc appearing as a full word.

Also I think you have ^ and $ backwards, and there's
a ^ I don't understand. I _think_ that a correct version
of what you're suggesting would be

(^|[^a-zA-Z])disc($|[^a-zA-Z])

But as far as I can see that simply doesn't work.
I haven't been able to use | that way, combining
_parts_ of a re. That was the first thing I tried.
The original works right except for not matching
at the start or end of a string, the thing with
the | doesn't work at all:
>>test = compile(r'(^|[^a-zA-Z])disc($|[^a-zA-Z])')
test.findall('')
[]
>>test.findall('disc')
[('', '')]
>>test.findall(' disc ')
[(' ', ' ')]
>>disc = compile(r'[^a-zA-Z]disc[^a-zA-Z]')
disc.findall(' disc disc disc')
[' disc ']
>>disc.findall(' disc disc disc')
[' disc ', ' disc ']
>>test.findall(' disc disc disc')
[(' ', ' '), (' ', ' ')]
>>disc.findall(' disc disc disc')
[' disc ', ' disc ']
>>disc.findall(' disc disc disc ')
[' disc ', ' disc ', ' disc ']

Diez
--
David C. Ullrich
Jun 27 '08 #4
Whitespace is actually \s. But [\s]disc[whatever]
doesn't do the job - then it won't match "(disc)",
which counts as "disc appearing as a full word.
Ok, then this works:

import re

test = """
disc
(disc)
foo disc bar
discuss
""".split("\n")

for t in test:
if re.search(r"(^|[^\w])(disc)($|[^\w])", t):
print "success:", t

Also I think you have ^ and $ backwards, and there's
a ^ I don't understand. I _think_ that a correct version
Yep, sorry for the confusion.

Diez
Jun 27 '08 #5
In article <ma************************************@python.org >,
"Russell Blau" <ru******@hotmail.comwrote:
"Diez B. Roggisch" <de***@nospam.web.dewrote in message
news:6a*************@mid.uni-berlin.de...
David C. Ullrich schrieb:
Say I want to replace 'disc' with 'disk', but only
when 'disc' is a complete word (don't want to change
'discuss' to 'diskuss'.) The following seems almost
right:

[^a-zA-Z])disc[^a-zA-Z]

The problem is that that doesn't match if 'disc' is at
the start or end of the string. Of course I could just
combine a few re's with |, but it seems like there should
(or might?) be a way to simply append a \A to the first
[^a-zA-Z] and a \Z to the second.
Why not

($|[\w])disc(^|[^\w])

I hope \w is really the literal for whitespace - might be something
different, see the docs.

No, \s is the literal for whitespace.
http://www.python.org/doc/current/lib/re-syntax.html

But how about:

text = re.sub(r"\bdisc\b", "disk", text_to_be_changed)

\b is the "word break" character,
Lovely - that's exactly right, thanks. I swear I looked at the
docs... I'm just blind or stupid. No wait, I'm blind _and_
stupid. No, blind and stupid and slow...

Doesn't precisely fit the _spec_ because of digits and underscores,
but it's close enough to solve the problem exactly. Thanks.
>it matches at the beginning or end of any
"word" (where a word is any sequence of \w characters, and \w is any
alphanumeric
character or _).

Note that this solution still doesn't catch "Disc" if it is capitalized.
Thanks. I didn't mention I wanted to catch both cases because I
already knew how to take care of that:

r"\b[dD]isc\b"
Russ
--
David C. Ullrich
Jun 27 '08 #6
On Wed, 04 Jun 2008 20:07:41 +0200, "Diez B. Roggisch"
<de***@nospam.web.dewrote:
>Whitespace is actually \s. But [\s]disc[whatever]
doesn't do the job - then it won't match "(disc)",
which counts as "disc appearing as a full word.

Ok, then this works:
Yes it does.

My real question was why doesn't a construction like

(A|B)C

work as expected. The code below shows that it does.
That puzzled me because I couldn't see any real
difference between your solution here and things
I'd tried that didn't work. But those things also
work in the code below - when I saw this just
now I was even more confused...

Oh. Turns out the actual reason for the confusion wasn't
regex syntax, it was the fact that findall doesn't
return what I thought it did - looking at the result
of findall() it seemed as thought the re was matching
empty strings and whitespace... Looking more
carefully at what findall is supposed to do everything
makes sense.

Sorry to be dense. Remind me to read more than the
first sentence next time:

"findall (pattern, string)
Return a list of all non-overlapping matches of pattern in string.
If one or more groups are present in the pattern, return a list of
groups;..."
>import re

test = """
disc
(disc)
foo disc bar
discuss
""".split("\n")

for t in test:
if re.search(r"(^|[^\w])(disc)($|[^\w])", t):
print "success:", t

>Also I think you have ^ and $ backwards, and there's
a ^ I don't understand. I _think_ that a correct version

Yep, sorry for the confusion.

Diez
David C. Ullrich
Jun 27 '08 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: William C. White | last post by:
Does anyone know of a way to use PHP /w Authorize.net AIM without using cURL? Our website is hosted on a shared drive and the webhost company doesn't installed additional software (such as cURL)...
2
by: Albert Ahtenberg | last post by:
Hello, I don't know if it is only me but I was sure that header("Location:url") redirects the browser instantly to URL, or at least stops the execution of the code. But appearantely it continues...
3
by: James | last post by:
Hi, I have a form with 2 fields. 'A' 'B' The user completes one of the fields and the form is submitted. On the results page I want to run a query, but this will change subject to which...
0
by: Ollivier Robert | last post by:
Hello, I'm trying to link PHP with Oracle 9.2.0/OCI8 with gcc 3.2.3 on a Solaris9 system. The link succeeds but everytime I try to run php, I get a SEGV from inside the libcnltsh.so library. ...
1
by: Richard Galli | last post by:
I want viewers to compare state laws on a single subject. Imagine a three-column table with a drop-down box on the top. A viewer selects a state from the list, and that state's text fills the...
4
by: Albert Ahtenberg | last post by:
Hello, I have two questions. 1. When the user presses the back button and returns to a form he filled the form is reseted. How do I leave there the values he inserted? 2. When the...
1
by: inderjit S Gabrie | last post by:
Hi all Here is the scenerio ...is it possibly to do this... i am getting valid course dates output on to a web which i have designed ....all is okay so far , look at the following web url ...
2
by: Jack | last post by:
Hi All, What is the PHP equivilent of Oracle bind variables in a SQL statement, e.g. select x from y where z=:parameter Which in asp/jsp would be followed by some statements to bind a value...
3
by: Sandwick | last post by:
I am trying to change the size of a drawing so they are all 3x3. the script below is what i was trying to use to cut it in half ... I get errors. I can display the normal picture but not the...
0
by: MeoLessi9 | last post by:
I have VirtualBox installed on Windows 11 and now I would like to install Kali on a virtual machine. However, on the official website, I see two options: "Installer images" and "Virtual machines"....
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: Aftab Ahmad | last post by:
Hello Experts! I have written a code in MS Access for a cmd called "WhatsApp Message" to open WhatsApp using that very code but the problem is that it gives a popup message everytime I clicked on...
0
by: Aftab Ahmad | last post by:
So, I have written a code for a cmd called "Send WhatsApp Message" to open and send WhatsApp messaage. The code is given below. Dim IE As Object Set IE =...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: marcoviolo | last post by:
Dear all, I would like to implement on my worksheet an vlookup dynamic , that consider a change of pivot excel via win32com, from an external excel (without open it) and save the new file into a...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.