473,405 Members | 2,445 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,405 software developers and data experts.

Nlp, Python and period

Hi,

are you aware of any nlp packages or algorithms in Python to spot
whether a '.' represents an end of sentence or rather something else (eg
Mr., fo*@home.co.uk, etc)?

Thanks

F.
Aug 4 '08 #1
4 1282
On 4 Aug, 11:59, Fred Mangusta <a...@bbb.itwrote:
Hi,

are you aware of any nlp packages or algorithms in Python to spot
whether a '.' represents an end of sentence or rather something else (eg
Mr., f...@home.co.uk, etc)?
I wouldn't mind finding out about such packages, either. I see that
NLTK offers a few options, with the following tokeniser being
interesting if you don't mind training the software:

http://nltk.org/doc/guides/tokenize....unkt-tokenizer

There was also discussion of this topic on Ned Batchelder's blog a
while back:

http://nedbatchelder.com/blog/200804...sentences.html

My comment on there (that I'm using a regular expression with some
postprocessing) still stands.

Paul
Aug 4 '08 #2
On Aug 4, 7:59 pm, Fred Mangusta <a...@bbb.itwrote:
Hi,

are you aware of any nlp packages or algorithms in Python to spot
whether a '.' represents an end of sentence or rather something else (eg
Mr., f...@home.co.uk, etc)?
google("python nltk") ... it may do what you want.
Aug 4 '08 #3
Hi Paul,

thanks for replying. I'm interested in knowing more about your regex
approach, but as you point out in your comment, seems like access to the
sourceforge mail archive is restricted. Is there any way I can read
about it? Would you be so kind to cut and paste it here for instance?

Thanks!
F.

Paul Boddie wrote:
There was also discussion of this topic on Ned Batchelder's blog a
while back:

http://nedbatchelder.com/blog/200804...sentences.html

My comment on there (that I'm using a regular expression with some
postprocessing) still stands.

Paul
Aug 4 '08 #4
On 4 Aug, 12:34, Fred Mangusta <a...@bbb.itwrote:
>
thanks for replying. I'm interested in knowing more about your regex
approach, but as you point out in your comment, seems like access to the
sourceforge mail archive is restricted. Is there any way I can read
about it? Would you be so kind to cut and paste it here for instance?
I can't log into SourceForge, possibly because I've forgotten my
password, but I can give you a fairly similar regular expression which
does some of the work:

sentence_pattern = re.compile(
r'(' +
r'[\(\"\[]*' + # Quoting or bracketing (optional)
r'[A-Z,a-z,0-9]' + # Match sentence with specific start
character
r'.+?' + # Match sentence content - "?" means non-
greedy
r'[\.\!\?]' + # End of sentence
r'[\)\"\]]*' + # End quoting or bracketing
r')' +
r'(\s+)' + # Spaces
r'[\(\"\[]*' + # Quoting or bracketing (optional)
r'[A-Z,0-9]' # Match sentence with specific start
character
)

This is mostly the same as that posted to SourceForge, but with some
enhancements; I've indented the part which actually produces the
matched sentence text in a group. Unfortunately, some postprocessing
is required to deal with abbreviations, and I maintain a list of these
against which I test the supposed ends of sentences that the regular
expression provides. In addition, I also try and detect initials (eg.
G. van Rossum) which the regular expression may regard as the end of a
sentence.

As I noted, I'd be interested to hear of any better solutions which
don't involve training.

Paul
Aug 4 '08 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

12
by: rhmd | last post by:
Just found Python and I love it. What an elegant language! I would like to use it for various applications, but the mathematical calculations are way too slow (a million sines 8 seconds in Python...
5
by: M. Laymon | last post by:
I just installed Python 2.3.3 under Windows XP professional. After I did, my wife tried to access her email using Outlook Express and got the error messages: Your server has unexpectedly...
13
by: Wayne Folta | last post by:
I've been a long-time Perl programmer, though I've not used a boatload of packages nor much of the tacky OO. A couple of years ago, I decided to look into Python and Ruby. Python looked OK, but...
3
by: Michael Sparks | last post by:
Hi, I'm posting a link to this since I hope it's of interest to people here :) I've written up the talk I gave at ACCU Python UK on the Kamaelia Framework, and it's been published as a BBC...
4
by: kj | last post by:
I'm a Perlhead (there, I said it). Years ago I made a genuine attempt to learn Python, but my intense disappointed with the way Python deals with scopes ultimately sapped my enthusiasm. I...
13
by: Yannick | last post by:
Hi, I would like to program a small game in Python, kind of like robocode (http://robocode.sourceforge.net/). Problem is that I would have to share the CPU between all the robots, and thus...
158
by: Giovanni Bajo | last post by:
Hello, I just read this mail by Brett Cannon: http://mail.python.org/pipermail/python-dev/2006-October/069139.html where the "PSF infrastracture committee", after weeks of evaluation, recommends...
4
by: kj | last post by:
I'm looking for "example implementations" of small projects in Python, similar to the ones given at the end of most chapters of The Perl Cookbook (2nd edition, isbn: 0596003137). (Unfortunately,...
15
by: kj | last post by:
Yet another noob question... Is there a way to mimic C's static variables in Python? Or something like it? The idea is to equip a given function with a set of constants that belong only to it,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.