Nlp, Python and period

Fred Mangusta

Hi,

are you aware of any nlp packages or algorithms in Python to spot
whether a '.' represents an end of sentence or rather something else (eg
Mr., fo*@home.co.uk, etc)?

Thanks

F.

Aug 4 '08 #1

Subscribe Post Reply

1282

Paul Boddie

On 4 Aug, 11:59, Fred Mangusta <a...@bbb.itwrote:

Hi,

are you aware of any nlp packages or algorithms in Python to spot
whether a '.' represents an end of sentence or rather something else (eg
Mr., f...@home.co.uk, etc)?

I wouldn't mind finding out about such packages, either. I see that
NLTK offers a few options, with the following tokeniser being
interesting if you don't mind training the software:

http://nltk.org/doc/guides/tokenize....unkt-tokenizer

There was also discussion of this topic on Ned Batchelder's blog a
while back:

http://nedbatchelder.com/blog/200804...sentences.html

My comment on there (that I'm using a regular expression with some
postprocessing) still stands.

Paul

Aug 4 '08 #2

John Machin

On Aug 4, 7:59 pm, Fred Mangusta <a...@bbb.itwrote:

Hi,

are you aware of any nlp packages or algorithms in Python to spot
whether a '.' represents an end of sentence or rather something else (eg
Mr., f...@home.co.uk, etc)?

google("python nltk") ... it may do what you want.

Aug 4 '08 #3

Fred Mangusta

Hi Paul,

thanks for replying. I'm interested in knowing more about your regex
approach, but as you point out in your comment, seems like access to the
sourceforge mail archive is restricted. Is there any way I can read
about it? Would you be so kind to cut and paste it here for instance?

Thanks!
F.

Paul Boddie wrote:

There was also discussion of this topic on Ned Batchelder's blog a
while back:

http://nedbatchelder.com/blog/200804...sentences.html

My comment on there (that I'm using a regular expression with some
postprocessing) still stands.

Paul

Aug 4 '08 #4

Paul Boddie

On 4 Aug, 12:34, Fred Mangusta <a...@bbb.itwrote:

>
thanks for replying. I'm interested in knowing more about your regex
approach, but as you point out in your comment, seems like access to the
sourceforge mail archive is restricted. Is there any way I can read
about it? Would you be so kind to cut and paste it here for instance?

I can't log into SourceForge, possibly because I've forgotten my
password, but I can give you a fairly similar regular expression which
does some of the work:

sentence_pattern = re.compile(
r'(' +
r'[\(\"\[]*' + # Quoting or bracketing (optional)
r'[A-Z,a-z,0-9]' + # Match sentence with specific start
character
r'.+?' + # Match sentence content - "?" means non-
greedy
r'[\.\!\?]' + # End of sentence
r'[\)\"\]]*' + # End quoting or bracketing
r')' +
r'(\s+)' + # Spaces
r'[\(\"\[]*' + # Quoting or bracketing (optional)
r'[A-Z,0-9]' # Match sentence with specific start
character
)

This is mostly the same as that posted to SourceForge, but with some
enhancements; I've indented the part which actually produces the
matched sentence text in a group. Unfortunately, some postprocessing
is required to deal with abbreviations, and I maintain a list of these
against which I test the supposed ends of sentences that the regular
expression provides. In addition, I also try and detect initials (eg.
G. van Rossum) which the regular expression may regard as the end of a
sentence.

As I noted, I'd be interested to hear of any better solutions which
don't involve training.

Paul

Aug 4 '08 #5

by: rhmd | last post by:

Just found Python and I love it. What an elegant language! I would like to use it for various applications, but the mathematical calculations are way too slow (a million sines 8 seconds in Python...

Python

Python installation breaks Outlook Express

by: M. Laymon | last post by:

I just installed Python 2.3.3 under Windows XP professional. After I did, my wife tried to access her email using Outlook Express and got the error messages: Your server has unexpectedly...

Python

New to Python: my impression v. Perl/Ruby

by: Wayne Folta | last post by:

I've been a long-time Perl programmer, though I've not used a boatload of packages nor much of the tacky OO. A couple of years ago, I decided to look into Python and Ruby. Python looked OK, but...

Python

BBC R&D White Paper on Kamaelia Published (Essentially a framework using communicating python generators)

by: Michael Sparks | last post by:

Hi, I'm posting a link to this since I hope it's of interest to people here :) I've written up the talk I gave at ACCU Python UK on the Kamaelia Framework, and it's been published as a BBC...

Python

Overcoming herpetophobia (or what's up w/ Python scopes)?

by: kj | last post by:

I'm a Perlhead (there, I said it). Years ago I made a genuine attempt to learn Python, but my intense disappointed with the way Python deals with scopes ultimately sapped my enthusiasm. I...

Python

Python share CPU time?

by: Yannick | last post by:

Hi, I would like to program a small game in Python, kind of like robocode (http://robocode.sourceforge.net/). Problem is that I would have to share the CPU between all the robots, and thus...

Python

158

Python to use a non open source bug tracker?

by: Giovanni Bajo | last post by:

Hello, I just read this mail by Brett Cannon: http://mail.python.org/pipermail/python-dev/2006-October/069139.html where the "PSF infrastracture committee", after weeks of evaluation, recommends...

Python

ISO Python example projects (like in Perl Cookbook)

by: kj | last post by:

I'm looking for "example implementations" of small projects in Python, similar to the ones given at the end of most chapters of The Perl Cookbook (2nd edition, isbn: 0596003137). (Unfortunately,...

Python

static variables in Python?

by: kj | last post by:

Yet another noob question... Is there a way to mimic C's static variables in Python? Or something like it? The idea is to equip a given function with a set of constants that belong only to it,...

Python

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

Nlp, Python and period

Similar topics