472,362 Members | 1,998 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,362 software developers and data experts.

find and replace with regular expressions

I am using regular expressions to search a string (always full
sentences, maybe more than one sentence) for common abbreviations and
remove the periods. I need to break the string into different
sentences but split('.') doesn't solve the whole problem because of
possible periods in the middle of a sentence.

So I have...

----------------

import re

middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')

# this will find abbreviations like e.g. or i.e. in the middle of a
sentence.
# then I want to remove the periods.

----------------

I want to keep the ie or eg but just take out the periods. Any
ideas? Of course newString = middle_abbr.sub('',txt) where txt is the
string will take out the entire abbreviation with the alphanumeric
characters included.
Jul 31 '08 #1
6 2767
On Jul 31, 3:07*pm, chrispoliq...@gmail.com wrote:
I am using regular expressions to search a string (always full
sentences, maybe more than one sentence) for common abbreviations and
remove the periods. *I need to break the string into different
sentences but split('.') doesn't solve the whole problem because of
possible periods in the middle of a sentence.

So I have...

----------------

import re

middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')

# this will find abbreviations like e.g. or i.e. in the middle of a
sentence.
# then I want to remove the periods.

----------------

I want to keep the ie or eg but just take out the periods. *Any
ideas? *Of course newString = middle_abbr.sub('',txt) where txt is the
string will take out the entire abbreviation with the alphanumeric
characters included.
>>middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')
s = 'A test, i.e., an example.'
a = middle_abbr.search(s) # find the abbreviation
b = re.compile('\.') # period pattern
c = b.sub('',a.group(0)) # remove periods from abbreviation
d = middle_abbr.sub(c,s) # substitute new abbr for old
d
'A test, ie, an example.'
Jul 31 '08 #2
On Jul 31, 3:56*pm, Mensanator <mensana...@aol.comwrote:
On Jul 31, 3:07*pm, chrispoliq...@gmail.com wrote:


I am using regular expressions to search a string (always full
sentences, maybe more than one sentence) for common abbreviations and
remove the periods. *I need to break the string into different
sentences but split('.') doesn't solve the whole problem because of
possible periods in the middle of a sentence.
So I have...
----------------
import re
middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')
# this will find abbreviations like e.g. or i.e. in the middle of a
sentence.
# then I want to remove the periods.
----------------
I want to keep the ie or eg but just take out the periods. *Any
ideas? *Of course newString = middle_abbr.sub('',txt) where txt is the
string will take out the entire abbreviation with the alphanumeric
characters included.
>middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')
s = 'A test, i.e., an example.'
a = middle_abbr.search(s) * * *# find the abbreviation
b = re.compile('\.') * * * * * # period pattern
c = b.sub('',a.group(0)) * * * # remove periods from abbreviation
d = middle_abbr.sub(c,s) * * * # substitute new abbr for old
d

'A test, ie, an example.'

A more versatile version:

import re

middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')
s = 'A test, i.e., an example.'
a = middle_abbr.search(s) # find the abbreviation
b = re.compile('\.') # period pattern
c = b.sub('',a.group(0)) # remove periods from abbreviation
d = middle_abbr.sub(c,s) # substitute new abbr for old

print d
print
print

s = """A test, i.e., an example.
Yet another test, i.e., example with 2 abbr."""

a = middle_abbr.search(s) # find the abbreviation
c = b.sub('',a.group(0)) # remove periods from abbreviation
d = middle_abbr.sub(c,s) # substitute new abbr for old

print d
print
print

s = """A test, i.e., an example.
Yet another test, i.e., example with 2 abbr.
A multi-test, e.g., one with different abbr."""

done = False

while not done:
a = middle_abbr.search(s) # find the abbreviation
if a:
c = b.sub('',a.group(0)) # remove periods from abbreviation
s = middle_abbr.sub(c,s,1) # substitute new abbr for old ONCE
else: # repeat until all removed
done = True

print s

## A test, ie, an example.
##
##
## A test, ie, an example.
## Yet another test, ie, example with 2 abbr.'
##
##
## A test, ie, an example.
## Yet another test, ie, example with 2 abbr.
## A multi-test, eg, one with different abbr.
Jul 31 '08 #3
On Jul 31, 3:07*pm, chrispoliq...@gmail.com wrote:
>
middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')
When defining re's with string literals, it is good practice to use
the raw string literal format (precede with an 'r'):
middle_abbr = re.compile(r'[A-Za-z0-9]\.[A-Za-z0-9]\.')

What abbreviations have numeric digits in them?

I hope your input string doesn't include something like this:
For a good approximation of pi, use 3.1.

-- Paul
Jul 31 '08 #4
On Jul 31, 9:07*pm, chrispoliq...@gmail.com wrote:
I am using regular expressions to search a string (always full
sentences, maybe more than one sentence) for common abbreviations and
remove the periods. *I need to break the string into different
sentences but split('.') doesn't solve the whole problem because of
possible periods in the middle of a sentence.

So I have...

----------------

import re

middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')

# this will find abbreviations like e.g. or i.e. in the middle of a
sentence.
# then I want to remove the periods.

----------------

I want to keep the ie or eg but just take out the periods. *Any
ideas? *Of course newString = middle_abbr.sub('',txt) where txt is the
string will take out the entire abbreviation with the alphanumeric
characters included.
It's recommended that you should use a raw strings for regular
expressions.

Capture the letters using parentheses:

middle_abbr = re.compile(r'([A-Za-z0-9])\.([A-Za-z0-9])\.')

and replace what was found with what was captured:

newString = middle_abbr.sub(r'\1\2', txt)

HTH
Jul 31 '08 #5
On Jul 31, 10:07*pm, chrispoliq...@gmail.com wrote:
I am using regular expressions to search a string (always full
sentences, maybe more than one sentence) for common abbreviations and
remove the periods. *I need to break the string into different
sentences but split('.') doesn't solve the whole problem because of
possible periods in the middle of a sentence.

So I have...

----------------

import re

middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')

# this will find abbreviations like e.g. or i.e. in the middle of a
sentence.
# then I want to remove the periods.

----------------

I want to keep the ie or eg but just take out the periods. *Any
ideas? *Of course newString = middle_abbr.sub('',txt) where txt is the
string will take out the entire abbreviation with the alphanumeric
characters included.
Its impossible with regex. U could try it with a statistical analysis;
and even this would give u a good split.
Aug 1 '08 #6
On Aug 1, 12:53*pm, dusans <dusan.smit...@gmail.comwrote:
On Jul 31, 10:07*pm, chrispoliq...@gmail.com wrote:


I am using regular expressions to search a string (always full
sentences, maybe more than one sentence) for common abbreviations and
remove the periods. *I need to break the string into different
sentences but split('.') doesn't solve the whole problem because of
possible periods in the middle of a sentence.
So I have...
----------------
import re
middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')
# this will find abbreviations like e.g. or i.e. in the middle of a
sentence.
# then I want to remove the periods.
----------------
I want to keep the ie or eg but just take out the periods. *Any
ideas? *Of course newString = middle_abbr.sub('',txt) where txt is the
string will take out the entire abbreviation with the alphanumeric
characters included.

Its impossible with regex. U could try it with a statistical analysis;
and even this would give u a good split.
"and even this wont* give u a good split." :P
Aug 1 '08 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: higabe | last post by:
Three questions 1) I have a string function that works perfectly but according to W3C.org web site is syntactically flawed because it contains the characters </ in sequence. So how am I...
24
by: Wim Roffal | last post by:
Is there a possibility to do a string replace in javascript without regular experessions. It feels like using a hammer to crash an egg. Wim
1
by: Mark | last post by:
Using the Find and Replace in VS.NET, I'm trying to find methods that are in the form .... Foo** and I want to replace them with: BarFoo** However, put the text above in the Find and...
4
by: JackRazz | last post by:
I'm trying to use Visual Studio's Find/Replace to match VB declarations. This RegEx works fine in Regulator: ...
4
by: JackRazz | last post by:
Could someone give me a very simple regular expression for Visual Studio's search/replace using backreferences saving portions of the match as \1 or $1 or whatever. I want to use something I can...
4
by: lucky | last post by:
hi there!! i'm looking for a code snipett wich help me to search some words into a particular string and replace with a perticular word. i got a huge data string in which searching traditional...
6
by: **Developer** | last post by:
I can't find how to search an entire solution for a string, say "Sub (ByRef" Nor how to search the entire solution using Regular Expressions. These were my favorite things - please don't tell me...
8
by: John Pye | last post by:
Hi all I have a file with a bunch of perl regular expressions like so: /(^|)\*(.*?)\*(|$)/$1'''$2'''$3/ # bold /(^|)\_\_(.*?)\_\_(|$)/$1''<b>$2<\/ b>''$3/ # italic bold...
1
by: Allan Ebdrup | last post by:
I have a dynamic list of regular expressions, the expressions don't change very often but they can change. And I have a single string that I want to match the regular expressions against and find...
2
by: Kemmylinns12 | last post by:
Blockchain technology has emerged as a transformative force in the business world, offering unprecedented opportunities for innovation and efficiency. While initially associated with cryptocurrencies...
0
by: antdb | last post by:
Ⅰ. Advantage of AntDB: hyper-convergence + streaming processing engine In the overall architecture, a new "hyper-convergence" concept was proposed, which integrated multiple engines and...
0
by: Arjunsri | last post by:
I have a Redshift database that I need to use as an import data source. I have configured the DSN connection using the server, port, database, and credentials and received a successful connection...
0
hi
by: WisdomUfot | last post by:
It's an interesting question you've got about how Gmail hides the HTTP referrer when a link in an email is clicked. While I don't have the specific technical details, Gmail likely implements measures...
0
Oralloy
by: Oralloy | last post by:
Hello Folks, I am trying to hook up a CPU which I designed using SystemC to I/O pins on an FPGA. My problem (spelled failure) is with the synthesis of my design into a bitstream, not the C++...
0
by: Rahul1995seven | last post by:
Introduction: In the realm of programming languages, Python has emerged as a powerhouse. With its simplicity, versatility, and robustness, Python has gained popularity among beginners and experts...
1
by: Ricardo de Mila | last post by:
Dear people, good afternoon... I have a form in msAccess with lots of controls and a specific routine must be triggered if the mouse_down event happens in any control. Than I need to discover what...
0
by: Johno34 | last post by:
I have this click event on my form. It speaks to a Datasheet Subform Private Sub Command260_Click() Dim r As DAO.Recordset Set r = Form_frmABCD.Form.RecordsetClone r.MoveFirst Do If...
1
by: ezappsrUS | last post by:
Hi, I wonder if someone knows where I am going wrong below. I have a continuous form and two labels where only one would be visible depending on the checkbox being checked or not. Below is the...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.