473,386 Members | 1,842 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

find and replace with regular expressions

I am using regular expressions to search a string (always full
sentences, maybe more than one sentence) for common abbreviations and
remove the periods. I need to break the string into different
sentences but split('.') doesn't solve the whole problem because of
possible periods in the middle of a sentence.

So I have...

----------------

import re

middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')

# this will find abbreviations like e.g. or i.e. in the middle of a
sentence.
# then I want to remove the periods.

----------------

I want to keep the ie or eg but just take out the periods. Any
ideas? Of course newString = middle_abbr.sub('',txt) where txt is the
string will take out the entire abbreviation with the alphanumeric
characters included.
Jul 31 '08 #1
6 2842
On Jul 31, 3:07*pm, chrispoliq...@gmail.com wrote:
I am using regular expressions to search a string (always full
sentences, maybe more than one sentence) for common abbreviations and
remove the periods. *I need to break the string into different
sentences but split('.') doesn't solve the whole problem because of
possible periods in the middle of a sentence.

So I have...

----------------

import re

middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')

# this will find abbreviations like e.g. or i.e. in the middle of a
sentence.
# then I want to remove the periods.

----------------

I want to keep the ie or eg but just take out the periods. *Any
ideas? *Of course newString = middle_abbr.sub('',txt) where txt is the
string will take out the entire abbreviation with the alphanumeric
characters included.
>>middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')
s = 'A test, i.e., an example.'
a = middle_abbr.search(s) # find the abbreviation
b = re.compile('\.') # period pattern
c = b.sub('',a.group(0)) # remove periods from abbreviation
d = middle_abbr.sub(c,s) # substitute new abbr for old
d
'A test, ie, an example.'
Jul 31 '08 #2
On Jul 31, 3:56*pm, Mensanator <mensana...@aol.comwrote:
On Jul 31, 3:07*pm, chrispoliq...@gmail.com wrote:


I am using regular expressions to search a string (always full
sentences, maybe more than one sentence) for common abbreviations and
remove the periods. *I need to break the string into different
sentences but split('.') doesn't solve the whole problem because of
possible periods in the middle of a sentence.
So I have...
----------------
import re
middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')
# this will find abbreviations like e.g. or i.e. in the middle of a
sentence.
# then I want to remove the periods.
----------------
I want to keep the ie or eg but just take out the periods. *Any
ideas? *Of course newString = middle_abbr.sub('',txt) where txt is the
string will take out the entire abbreviation with the alphanumeric
characters included.
>middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')
s = 'A test, i.e., an example.'
a = middle_abbr.search(s) * * *# find the abbreviation
b = re.compile('\.') * * * * * # period pattern
c = b.sub('',a.group(0)) * * * # remove periods from abbreviation
d = middle_abbr.sub(c,s) * * * # substitute new abbr for old
d

'A test, ie, an example.'

A more versatile version:

import re

middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')
s = 'A test, i.e., an example.'
a = middle_abbr.search(s) # find the abbreviation
b = re.compile('\.') # period pattern
c = b.sub('',a.group(0)) # remove periods from abbreviation
d = middle_abbr.sub(c,s) # substitute new abbr for old

print d
print
print

s = """A test, i.e., an example.
Yet another test, i.e., example with 2 abbr."""

a = middle_abbr.search(s) # find the abbreviation
c = b.sub('',a.group(0)) # remove periods from abbreviation
d = middle_abbr.sub(c,s) # substitute new abbr for old

print d
print
print

s = """A test, i.e., an example.
Yet another test, i.e., example with 2 abbr.
A multi-test, e.g., one with different abbr."""

done = False

while not done:
a = middle_abbr.search(s) # find the abbreviation
if a:
c = b.sub('',a.group(0)) # remove periods from abbreviation
s = middle_abbr.sub(c,s,1) # substitute new abbr for old ONCE
else: # repeat until all removed
done = True

print s

## A test, ie, an example.
##
##
## A test, ie, an example.
## Yet another test, ie, example with 2 abbr.'
##
##
## A test, ie, an example.
## Yet another test, ie, example with 2 abbr.
## A multi-test, eg, one with different abbr.
Jul 31 '08 #3
On Jul 31, 3:07*pm, chrispoliq...@gmail.com wrote:
>
middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')
When defining re's with string literals, it is good practice to use
the raw string literal format (precede with an 'r'):
middle_abbr = re.compile(r'[A-Za-z0-9]\.[A-Za-z0-9]\.')

What abbreviations have numeric digits in them?

I hope your input string doesn't include something like this:
For a good approximation of pi, use 3.1.

-- Paul
Jul 31 '08 #4
On Jul 31, 9:07*pm, chrispoliq...@gmail.com wrote:
I am using regular expressions to search a string (always full
sentences, maybe more than one sentence) for common abbreviations and
remove the periods. *I need to break the string into different
sentences but split('.') doesn't solve the whole problem because of
possible periods in the middle of a sentence.

So I have...

----------------

import re

middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')

# this will find abbreviations like e.g. or i.e. in the middle of a
sentence.
# then I want to remove the periods.

----------------

I want to keep the ie or eg but just take out the periods. *Any
ideas? *Of course newString = middle_abbr.sub('',txt) where txt is the
string will take out the entire abbreviation with the alphanumeric
characters included.
It's recommended that you should use a raw strings for regular
expressions.

Capture the letters using parentheses:

middle_abbr = re.compile(r'([A-Za-z0-9])\.([A-Za-z0-9])\.')

and replace what was found with what was captured:

newString = middle_abbr.sub(r'\1\2', txt)

HTH
Jul 31 '08 #5
On Jul 31, 10:07*pm, chrispoliq...@gmail.com wrote:
I am using regular expressions to search a string (always full
sentences, maybe more than one sentence) for common abbreviations and
remove the periods. *I need to break the string into different
sentences but split('.') doesn't solve the whole problem because of
possible periods in the middle of a sentence.

So I have...

----------------

import re

middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')

# this will find abbreviations like e.g. or i.e. in the middle of a
sentence.
# then I want to remove the periods.

----------------

I want to keep the ie or eg but just take out the periods. *Any
ideas? *Of course newString = middle_abbr.sub('',txt) where txt is the
string will take out the entire abbreviation with the alphanumeric
characters included.
Its impossible with regex. U could try it with a statistical analysis;
and even this would give u a good split.
Aug 1 '08 #6
On Aug 1, 12:53*pm, dusans <dusan.smit...@gmail.comwrote:
On Jul 31, 10:07*pm, chrispoliq...@gmail.com wrote:


I am using regular expressions to search a string (always full
sentences, maybe more than one sentence) for common abbreviations and
remove the periods. *I need to break the string into different
sentences but split('.') doesn't solve the whole problem because of
possible periods in the middle of a sentence.
So I have...
----------------
import re
middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')
# this will find abbreviations like e.g. or i.e. in the middle of a
sentence.
# then I want to remove the periods.
----------------
I want to keep the ie or eg but just take out the periods. *Any
ideas? *Of course newString = middle_abbr.sub('',txt) where txt is the
string will take out the entire abbreviation with the alphanumeric
characters included.

Its impossible with regex. U could try it with a statistical analysis;
and even this would give u a good split.
"and even this wont* give u a good split." :P
Aug 1 '08 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: higabe | last post by:
Three questions 1) I have a string function that works perfectly but according to W3C.org web site is syntactically flawed because it contains the characters </ in sequence. So how am I...
24
by: Wim Roffal | last post by:
Is there a possibility to do a string replace in javascript without regular experessions. It feels like using a hammer to crash an egg. Wim
1
by: Mark | last post by:
Using the Find and Replace in VS.NET, I'm trying to find methods that are in the form .... Foo** and I want to replace them with: BarFoo** However, put the text above in the Find and...
4
by: JackRazz | last post by:
I'm trying to use Visual Studio's Find/Replace to match VB declarations. This RegEx works fine in Regulator: ...
4
by: JackRazz | last post by:
Could someone give me a very simple regular expression for Visual Studio's search/replace using backreferences saving portions of the match as \1 or $1 or whatever. I want to use something I can...
4
by: lucky | last post by:
hi there!! i'm looking for a code snipett wich help me to search some words into a particular string and replace with a perticular word. i got a huge data string in which searching traditional...
6
by: **Developer** | last post by:
I can't find how to search an entire solution for a string, say "Sub (ByRef" Nor how to search the entire solution using Regular Expressions. These were my favorite things - please don't tell me...
8
by: John Pye | last post by:
Hi all I have a file with a bunch of perl regular expressions like so: /(^|)\*(.*?)\*(|$)/$1'''$2'''$3/ # bold /(^|)\_\_(.*?)\_\_(|$)/$1''<b>$2<\/ b>''$3/ # italic bold...
1
by: Allan Ebdrup | last post by:
I have a dynamic list of regular expressions, the expressions don't change very often but they can change. And I have a single string that I want to match the regular expressions against and find...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.