By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,592 Members | 1,477 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,592 IT Pros & Developers. It's quick & easy.

text analysis in python

P: n/a
Hi,

I'm a postgraduate and my project deals with a fair bit of text
analysis. I'm looking for some libraries and tools that is geared
towards text analysis (and text engineering). So far, the most
comprehensive toolkit in python for my purpose is NLTK (natural language
tool kit) by Edward Loper and Steven Bird, followed by mxTextTools. Are
there any OSS tools out there that is more comprehensive than NLTK?

In the Java world, there is GATE (general architecture for text
engineering) and it seems very impressive. Are there something like that
for Python?

Thanks in advance.

Cheers
Maurice

Jul 18 '05 #1
Share this Question
Share on Google+
10 Replies


P: n/a
In article <ma**************************************@python.o rg>,
Maurice Ling <ma*********@acm.org> wrote:
Jul 18 '05 #2

P: n/a
The book "Text Processing in Python" by David Mertz, available online
at http://gnosis.cx/TPiP/ , may be helpful.

Jul 18 '05 #3

P: n/a
Jul 18 '05 #4

P: n/a
Maurice Ling wrote:
Hi,

I'm a postgraduate and my project deals with a fair bit of text
analysis. I'm looking for some libraries and tools that is geared
towards text analysis (and text engineering). So far, the most
comprehensive toolkit in python for my purpose is NLTK (natural language
tool kit) by Edward Loper and Steven Bird, followed by mxTextTools. Are
there any OSS tools out there that is more comprehensive than NLTK?

In the Java world, there is GATE (general architecture for text
engineering) and it seems very impressive. Are there something like that
for Python?

Thanks in advance.

Cheers
Maurice


You might try http://web.media.mit.edu/~hugo/montylingua/

"Liu, Hugo (2004). MontyLingua: An end-to-end natural
language processor with common sense. Available
at: web.media.mit.edu/~hugo/montylingua."
Jul 18 '05 #5

P: n/a
Mark Winrock wrote:


You might try http://web.media.mit.edu/~hugo/montylingua/

"Liu, Hugo (2004). MontyLingua: An end-to-end natural
language processor with common sense. Available
at: web.media.mit.edu/~hugo/montylingua."

Thanks Mark. I've downloaded MontyLingua and it looks pretty cool. To
me, it seems like pretty much geared to people like myself who needs
something to process written text but do not need the hardcore bolts and
nuts of a computational linguistist. NLTK is more of the bolts and nuts
toolkit. GATE still seems more advanced than MontyLingua but to a
different end.

Is there anyone in this forum that is using or had used MontyLingua and
is happy to comment more on it? I'm happy to get more opinions.

Thanks and cheers
Maurice
Jul 18 '05 #6

P: n/a

"Maurice LING" <ma*********@acm.org> wrote in message
news:42**************@acm.org...
Say I code my stuffs in Jython (importing java libraries) in a file
"text.py"
Just to be clear, Jython is not a separate langague that you code *in*, but
a separate implementation that you may slightly differently code *for*.
... Will there be any issues when I try to import text.py into CPython?


If text.py is written in an appropriate version of Python, it itself will
cause no problem. Hoqwever, when it imports javacode files, as opposed to
CPython bytecode files, CPython will choke.

Terry J. Reedy

Jul 18 '05 #7

P: n/a
Maurice Ling wrote:
In the Java world, there is GATE (general architecture for text
engineering) and it seems very impressive. Are there something like that
for Python?


I worked with GATE this last summer and really hated it. Can't decide
whether that was just my growing distaste for Java or actually the GATE
API. Anyway, if you're looking for something like GATE that (in my
experience) runs significantly faster, you should look at Ellogon
(www.ellogon.org). It's written in C and TCL, with C++, Java, Perl, and
Python bindings. And I believe, if you have any software already
written for GATE, Ellogon can run those modules directly. I've
personally never done so -- all my modules are written in Python (often
simple wrappers for things like MXPOST, MXTerminator, Charniak's parser,
etc.) I find the Python interface simple and easy to use, and they've
added a number of my suggestions to the API in the last release.

STeVe
Jul 18 '05 #8

P: n/a
Terry Reedy wrote:
"Maurice LING" <ma*********@acm.org> wrote in message
news:42**************@acm.org...
Say I code my stuffs in Jython (importing java libraries) in a file
"text.py"

Just to be clear, Jython is not a separate langague that you code *in*, but
a separate implementation that you may slightly differently code *for*.

Yes, I do get this point rightly. Jython is just an implementation of
Python virtual machine using Java. I do note that there are some
differences, such as, Jython can only handle pure python modules.
However, I'm not a language expert to differentiate language differences
between these 2 implementations of Python, as in Jython and CPython. If
someone care to enlighten, it will be my pleasure to consult. TIA.
... Will there be any issues when I try to import text.py into CPython?

If text.py is written in an appropriate version of Python, it itself will
cause no problem. Hoqwever, when it imports javacode files, as opposed to
CPython bytecode files, CPython will choke.

In my example, the file "text.py" is coded in Jython, importing Java
libraries. I do get that I cannot import Java jar files directly into
CPython. What I do not get is that what is so special about Jython that
it can "fool" CPython into using Java libraries... or is that there will
always be a need for Java virtual machine and Python virtual machine
when I use Java libraries in Jython... and importing Jython coded files
into CPython....

Cheers
Maurice
Jul 18 '05 #9

P: n/a
On Mon, 04 Apr 2005 09:36:32 +1000, Maurice LING <ma*********@acm.org>
declaimed the following in comp.lang.python:
Yes, I do get this point rightly. Jython is just an implementation of
Python virtual machine using Java. I do note that there are some


Pardon? I though Jython directly used the Java VM... It is not a
Python VM at all. It's the same language at the source level, but a
totally different back-end.

Hence, it requires the JVM to be able to run anything that
imports a Java library. Pure Python (source code) is compatible because
the two implementations will "compile" into either JVM byte code
(Jython) or classic Python byte code (CPython).

The CPython /run time/ has no facilities for interpreting JVM
byte code and can not, therefore, process Java library imports.
Similarly, the JVM has no facilities for interfacing with CPython
compiled libraries.

-- ================================================== ============ <
wl*****@ix.netcom.com | Wulfraed Dennis Lee Bieber KD6MOG <
wu******@dm.net | Bestiaria Support Staff <
================================================== ============ <
Home Page: <http://www.dm.net/~wulfraed/> <
Overflow Page: <http://wlfraed.home.netcom.com/> <

Jul 18 '05 #10

P: n/a
Maurice LING wrote:
Terry Reedy wrote:
"Maurice LING" <ma*********@acm.org> wrote in message
news:42**************@acm.org...
Say I code my stuffs in Jython (importing java libraries) in a file
"text.py"


Just to be clear, Jython is not a separate langague that you code
*in*, but a separate implementation that you may slightly differently
code *for*.

Yes, I do get this point rightly. Jython is just an implementation of
Python virtual machine using Java. I do note that there are some
differences, such as, Jython can only handle pure python modules.
However, I'm not a language expert to differentiate language differences
between these 2 implementations of Python, as in Jython and CPython. If
someone care to enlighten, it will be my pleasure to consult. TIA.

That's not strictly correct. The Python virtual machine isn;t
implemented at all in Jython, instead the JVM is used as the compilation
target.
... Will there be any issues when I try to import text.py into CPython?


If text.py is written in an appropriate version of Python, it itself
will cause no problem. Hoqwever, when it imports javacode files, as
opposed to CPython bytecode files, CPython will choke.

In my example, the file "text.py" is coded in Jython, importing Java
libraries. I do get that I cannot import Java jar files directly into
CPython. What I do not get is that what is so special about Jython that
it can "fool" CPython into using Java libraries... or is that there will
always be a need for Java virtual machine and Python virtual machine
when I use Java libraries in Jython... and importing Jython coded files
into CPython....

Jython is pretty much a Python interpreter that compiles Python into JVM
bytecodes. Consequently the amount of "trickery" involved is rather
less, though clearly there is some (automated conversion b etween Java
and Pythin data types where appropriate, and automated signature-based
selection of the appropriate Java method being the two most obvious).

regards
Steve
--
Steve Holden +1 703 861 4237 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/
Python Web Programming http://pydish.holdenweb.com/

Jul 18 '05 #11

This discussion thread is closed

Replies have been disabled for this discussion.