find and replace with regular expressions

chrispoliquin

I am using regular expressions to search a string (always full
sentences, maybe more than one sentence) for common abbreviations and
remove the periods. I need to break the string into different
sentences but split('.') doesn't solve the whole problem because of
possible periods in the middle of a sentence.

So I have...

----------------

import re

middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')

# this will find abbreviations like e.g. or i.e. in the middle of a
sentence.
# then I want to remove the periods.

----------------

I want to keep the ie or eg but just take out the periods. Any
ideas? Of course newString = middle_abbr.sub('',txt) where txt is the
string will take out the entire abbreviation with the alphanumeric
characters included.

Jul 31 '08 #1

Subscribe Post Reply

2842

Mensanator

On Jul 31, 3:07*pm, chrispoliq...@gmail.com wrote:

I am using regular expressions to search a string (always full
sentences, maybe more than one sentence) for common abbreviations and
remove the periods. *I need to break the string into different
sentences but split('.') doesn't solve the whole problem because of
possible periods in the middle of a sentence.

So I have...

----------------

import re

middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')

# this will find abbreviations like e.g. or i.e. in the middle of a
sentence.
# then I want to remove the periods.

----------------

I want to keep the ie or eg but just take out the periods. *Any
ideas? *Of course newString = middle_abbr.sub('',txt) where txt is the
string will take out the entire abbreviation with the alphanumeric
characters included.

>>middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')
s = 'A test, i.e., an example.'
a = middle_abbr.search(s) # find the abbreviation
b = re.compile('\.') # period pattern
c = b.sub('',a.group(0)) # remove periods from abbreviation
d = middle_abbr.sub(c,s) # substitute new abbr for old
d

'A test, ie, an example.'

Jul 31 '08 #2

Mensanator

On Jul 31, 3:56*pm, Mensanator <mensana...@aol.comwrote:

On Jul 31, 3:07*pm, chrispoliq...@gmail.com wrote:

I am using regular expressions to search a string (always full
sentences, maybe more than one sentence) for common abbreviations and
remove the periods. *I need to break the string into different
sentences but split('.') doesn't solve the whole problem because of
possible periods in the middle of a sentence.

So I have...

----------------

import re

middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')

# this will find abbreviations like e.g. or i.e. in the middle of a
sentence.
# then I want to remove the periods.

----------------

I want to keep the ie or eg but just take out the periods. *Any
ideas? *Of course newString = middle_abbr.sub('',txt) where txt is the
string will take out the entire abbreviation with the alphanumeric
characters included.
>middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')
s = 'A test, i.e., an example.'
a = middle_abbr.search(s) * * *# find the abbreviation
b = re.compile('\.') * * * * * # period pattern
c = b.sub('',a.group(0)) * * * # remove periods from abbreviation
d = middle_abbr.sub(c,s) * * * # substitute new abbr for old
d

'A test, ie, an example.'

A more versatile version:

import re

middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')
s = 'A test, i.e., an example.'
a = middle_abbr.search(s) # find the abbreviation
b = re.compile('\.') # period pattern
c = b.sub('',a.group(0)) # remove periods from abbreviation
d = middle_abbr.sub(c,s) # substitute new abbr for old

print d
print
print

s = """A test, i.e., an example.
Yet another test, i.e., example with 2 abbr."""

a = middle_abbr.search(s) # find the abbreviation
c = b.sub('',a.group(0)) # remove periods from abbreviation
d = middle_abbr.sub(c,s) # substitute new abbr for old

print d
print
print

s = """A test, i.e., an example.
Yet another test, i.e., example with 2 abbr.
A multi-test, e.g., one with different abbr."""

done = False

while not done:
a = middle_abbr.search(s) # find the abbreviation
if a:
c = b.sub('',a.group(0)) # remove periods from abbreviation
s = middle_abbr.sub(c,s,1) # substitute new abbr for old ONCE
else: # repeat until all removed
done = True

print s

## A test, ie, an example.
##
##
## A test, ie, an example.
## Yet another test, ie, example with 2 abbr.'
##
##
## A test, ie, an example.
## Yet another test, ie, example with 2 abbr.
## A multi-test, eg, one with different abbr.

Jul 31 '08 #3

Paul McGuire

On Jul 31, 3:07*pm, chrispoliq...@gmail.com wrote:

>
middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')

When defining re's with string literals, it is good practice to use
the raw string literal format (precede with an 'r'):
middle_abbr = re.compile(r'[A-Za-z0-9]\.[A-Za-z0-9]\.')

What abbreviations have numeric digits in them?

I hope your input string doesn't include something like this:
For a good approximation of pi, use 3.1.

-- Paul

Jul 31 '08 #4

MRAB

On Jul 31, 9:07*pm, chrispoliq...@gmail.com wrote:

I am using regular expressions to search a string (always full
sentences, maybe more than one sentence) for common abbreviations and
remove the periods. *I need to break the string into different
sentences but split('.') doesn't solve the whole problem because of
possible periods in the middle of a sentence.

So I have...

----------------

import re

middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')

# this will find abbreviations like e.g. or i.e. in the middle of a
sentence.
# then I want to remove the periods.

----------------

I want to keep the ie or eg but just take out the periods. *Any
ideas? *Of course newString = middle_abbr.sub('',txt) where txt is the
string will take out the entire abbreviation with the alphanumeric
characters included.

It's recommended that you should use a raw strings for regular
expressions.

Capture the letters using parentheses:

middle_abbr = re.compile(r'([A-Za-z0-9])\.([A-Za-z0-9])\.')

and replace what was found with what was captured:

newString = middle_abbr.sub(r'\1\2', txt)

HTH

Jul 31 '08 #5

dusans

On Jul 31, 10:07*pm, chrispoliq...@gmail.com wrote:

I am using regular expressions to search a string (always full
sentences, maybe more than one sentence) for common abbreviations and
remove the periods. *I need to break the string into different
sentences but split('.') doesn't solve the whole problem because of
possible periods in the middle of a sentence.

So I have...

----------------

import re

middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')

# this will find abbreviations like e.g. or i.e. in the middle of a
sentence.
# then I want to remove the periods.

----------------

I want to keep the ie or eg but just take out the periods. *Any
ideas? *Of course newString = middle_abbr.sub('',txt) where txt is the
string will take out the entire abbreviation with the alphanumeric
characters included.

Its impossible with regex. U could try it with a statistical analysis;
and even this would give u a good split.

Aug 1 '08 #6

dusans

On Aug 1, 12:53*pm, dusans <dusan.smit...@gmail.comwrote:

On Jul 31, 10:07*pm, chrispoliq...@gmail.com wrote:

I am using regular expressions to search a string (always full
sentences, maybe more than one sentence) for common abbreviations and
remove the periods. *I need to break the string into different
sentences but split('.') doesn't solve the whole problem because of
possible periods in the middle of a sentence.

So I have...

----------------

import re

middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')

# this will find abbreviations like e.g. or i.e. in the middle of a
sentence.
# then I want to remove the periods.

----------------

I want to keep the ie or eg but just take out the periods. *Any
ideas? *Of course newString = middle_abbr.sub('',txt) where txt is the
string will take out the entire abbreviation with the alphanumeric
characters included.

Its impossible with regex. U could try it with a statistical analysis;
and even this would give u a good split.

"and even this wont* give u a good split." :P

Aug 1 '08 #7

Similar topics

String.replace(/</g,'<');

by: higabe | last post by:

Three questions 1) I have a string function that works perfectly but according to W3C.org web site is syntactically flawed because it contains the characters </ in sequence. So how am I...

Javascript

String replace alternative

by: Wim Roffal | last post by:

Is there a possibility to do a string replace in javascript without regular experessions. It feels like using a hammer to crash an egg. Wim

Javascript

Regular Expression find and replace?

by: Mark | last post by:

Using the Find and Replace in VS.NET, I'm trying to find methods that are in the form .... Foo** and I want to replace them with: BarFoo** However, put the text above in the Find and...

C# / C Sharp

Regular Expression to Capture VB Declarations in Visual Studios Find/Replace

by: JackRazz | last post by:

I'm trying to use Visual Studio's Find/Replace to match VB declarations. This RegEx works fine in Regulator: ...

C# / C Sharp

Regular Expressions in Visual Studio's Find & Replace

by: JackRazz | last post by:

Could someone give me a very simple regular expression for Visual Studio's search/replace using backreferences saving portions of the match as \1 or $1 or whatever. I want to use something I can...

Visual Basic .NET

Search, Replace with Regular Expression

by: lucky | last post by:

hi there!! i'm looking for a code snipett wich help me to search some words into a particular string and replace with a perticular word. i got a huge data string in which searching traditional...

Visual Basic .NET

What happen to Find in Files in VS2005?

by: **Developer** | last post by:

I can't find how to search an entire solution for a string, say "Sub (ByRef" Nor how to search the entire solution using Regular Expressions. These were my favorite things - please don't tell me...

Visual Basic .NET

splitting perl-style find/replace regexp using python

by: John Pye | last post by:

Hi all I have a file with a bunch of perl regular expressions like so: /(^|)\*(.*?)\*(|$)/$1'''$2'''$3/ # bold /(^|)\_\_(.*?)\_\_(|$)/$1''<b>$2<\/ b>''$3/ # italic bold...

Python

Dynamic list of regular expressions, find the one that matches.

by: Allan Ebdrup | last post by:

I have a dynamic list of regular expressions, the expressions don't change very often but they can change. And I have a single string that I want to match the regular expressions against and find...

C# / C Sharp

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General