473,326 Members | 2,815 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,326 software developers and data experts.

text analysis in python

Hi,

I'm a postgraduate and my project deals with a fair bit of text
analysis. I'm looking for some libraries and tools that is geared
towards text analysis (and text engineering). So far, the most
comprehensive toolkit in python for my purpose is NLTK (natural language
tool kit) by Edward Loper and Steven Bird, followed by mxTextTools. Are
there any OSS tools out there that is more comprehensive than NLTK?

In the Java world, there is GATE (general architecture for text
engineering) and it seems very impressive. Are there something like that
for Python?

Thanks in advance.

Cheers
Maurice

Jul 18 '05 #1
10 2870
In article <ma**************************************@python.o rg>,
Maurice Ling <ma*********@acm.org> wrote:
Jul 18 '05 #2
The book "Text Processing in Python" by David Mertz, available online
at http://gnosis.cx/TPiP/ , may be helpful.

Jul 18 '05 #3
Jul 18 '05 #4
Maurice Ling wrote:
Hi,

I'm a postgraduate and my project deals with a fair bit of text
analysis. I'm looking for some libraries and tools that is geared
towards text analysis (and text engineering). So far, the most
comprehensive toolkit in python for my purpose is NLTK (natural language
tool kit) by Edward Loper and Steven Bird, followed by mxTextTools. Are
there any OSS tools out there that is more comprehensive than NLTK?

In the Java world, there is GATE (general architecture for text
engineering) and it seems very impressive. Are there something like that
for Python?

Thanks in advance.

Cheers
Maurice


You might try http://web.media.mit.edu/~hugo/montylingua/

"Liu, Hugo (2004). MontyLingua: An end-to-end natural
language processor with common sense. Available
at: web.media.mit.edu/~hugo/montylingua."
Jul 18 '05 #5
Mark Winrock wrote:


You might try http://web.media.mit.edu/~hugo/montylingua/

"Liu, Hugo (2004). MontyLingua: An end-to-end natural
language processor with common sense. Available
at: web.media.mit.edu/~hugo/montylingua."

Thanks Mark. I've downloaded MontyLingua and it looks pretty cool. To
me, it seems like pretty much geared to people like myself who needs
something to process written text but do not need the hardcore bolts and
nuts of a computational linguistist. NLTK is more of the bolts and nuts
toolkit. GATE still seems more advanced than MontyLingua but to a
different end.

Is there anyone in this forum that is using or had used MontyLingua and
is happy to comment more on it? I'm happy to get more opinions.

Thanks and cheers
Maurice
Jul 18 '05 #6

"Maurice LING" <ma*********@acm.org> wrote in message
news:42**************@acm.org...
Say I code my stuffs in Jython (importing java libraries) in a file
"text.py"
Just to be clear, Jython is not a separate langague that you code *in*, but
a separate implementation that you may slightly differently code *for*.
... Will there be any issues when I try to import text.py into CPython?


If text.py is written in an appropriate version of Python, it itself will
cause no problem. Hoqwever, when it imports javacode files, as opposed to
CPython bytecode files, CPython will choke.

Terry J. Reedy

Jul 18 '05 #7
Maurice Ling wrote:
In the Java world, there is GATE (general architecture for text
engineering) and it seems very impressive. Are there something like that
for Python?


I worked with GATE this last summer and really hated it. Can't decide
whether that was just my growing distaste for Java or actually the GATE
API. Anyway, if you're looking for something like GATE that (in my
experience) runs significantly faster, you should look at Ellogon
(www.ellogon.org). It's written in C and TCL, with C++, Java, Perl, and
Python bindings. And I believe, if you have any software already
written for GATE, Ellogon can run those modules directly. I've
personally never done so -- all my modules are written in Python (often
simple wrappers for things like MXPOST, MXTerminator, Charniak's parser,
etc.) I find the Python interface simple and easy to use, and they've
added a number of my suggestions to the API in the last release.

STeVe
Jul 18 '05 #8
Terry Reedy wrote:
"Maurice LING" <ma*********@acm.org> wrote in message
news:42**************@acm.org...
Say I code my stuffs in Jython (importing java libraries) in a file
"text.py"

Just to be clear, Jython is not a separate langague that you code *in*, but
a separate implementation that you may slightly differently code *for*.

Yes, I do get this point rightly. Jython is just an implementation of
Python virtual machine using Java. I do note that there are some
differences, such as, Jython can only handle pure python modules.
However, I'm not a language expert to differentiate language differences
between these 2 implementations of Python, as in Jython and CPython. If
someone care to enlighten, it will be my pleasure to consult. TIA.
... Will there be any issues when I try to import text.py into CPython?

If text.py is written in an appropriate version of Python, it itself will
cause no problem. Hoqwever, when it imports javacode files, as opposed to
CPython bytecode files, CPython will choke.

In my example, the file "text.py" is coded in Jython, importing Java
libraries. I do get that I cannot import Java jar files directly into
CPython. What I do not get is that what is so special about Jython that
it can "fool" CPython into using Java libraries... or is that there will
always be a need for Java virtual machine and Python virtual machine
when I use Java libraries in Jython... and importing Jython coded files
into CPython....

Cheers
Maurice
Jul 18 '05 #9
On Mon, 04 Apr 2005 09:36:32 +1000, Maurice LING <ma*********@acm.org>
declaimed the following in comp.lang.python:
Yes, I do get this point rightly. Jython is just an implementation of
Python virtual machine using Java. I do note that there are some


Pardon? I though Jython directly used the Java VM... It is not a
Python VM at all. It's the same language at the source level, but a
totally different back-end.

Hence, it requires the JVM to be able to run anything that
imports a Java library. Pure Python (source code) is compatible because
the two implementations will "compile" into either JVM byte code
(Jython) or classic Python byte code (CPython).

The CPython /run time/ has no facilities for interpreting JVM
byte code and can not, therefore, process Java library imports.
Similarly, the JVM has no facilities for interfacing with CPython
compiled libraries.

-- ================================================== ============ <
wl*****@ix.netcom.com | Wulfraed Dennis Lee Bieber KD6MOG <
wu******@dm.net | Bestiaria Support Staff <
================================================== ============ <
Home Page: <http://www.dm.net/~wulfraed/> <
Overflow Page: <http://wlfraed.home.netcom.com/> <

Jul 18 '05 #10
Maurice LING wrote:
Terry Reedy wrote:
"Maurice LING" <ma*********@acm.org> wrote in message
news:42**************@acm.org...
Say I code my stuffs in Jython (importing java libraries) in a file
"text.py"


Just to be clear, Jython is not a separate langague that you code
*in*, but a separate implementation that you may slightly differently
code *for*.

Yes, I do get this point rightly. Jython is just an implementation of
Python virtual machine using Java. I do note that there are some
differences, such as, Jython can only handle pure python modules.
However, I'm not a language expert to differentiate language differences
between these 2 implementations of Python, as in Jython and CPython. If
someone care to enlighten, it will be my pleasure to consult. TIA.

That's not strictly correct. The Python virtual machine isn;t
implemented at all in Jython, instead the JVM is used as the compilation
target.
... Will there be any issues when I try to import text.py into CPython?


If text.py is written in an appropriate version of Python, it itself
will cause no problem. Hoqwever, when it imports javacode files, as
opposed to CPython bytecode files, CPython will choke.

In my example, the file "text.py" is coded in Jython, importing Java
libraries. I do get that I cannot import Java jar files directly into
CPython. What I do not get is that what is so special about Jython that
it can "fool" CPython into using Java libraries... or is that there will
always be a need for Java virtual machine and Python virtual machine
when I use Java libraries in Jython... and importing Jython coded files
into CPython....

Jython is pretty much a Python interpreter that compiles Python into JVM
bytecodes. Consequently the amount of "trickery" involved is rather
less, though clearly there is some (automated conversion b etween Java
and Pythin data types where appropriate, and automated signature-based
selection of the appropriate Java method being the two most obvious).

regards
Steve
--
Steve Holden +1 703 861 4237 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/
Python Web Programming http://pydish.holdenweb.com/

Jul 18 '05 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: beliavsky | last post by:
If I run PyChecker on the following program, stored in xtry.py, m = 10000000 k = 0 for i in xrange(m): k = k + i print k x = range(3) print x
5
by: Bishara Gabriel | last post by:
I encourage feedback from all parties and especially those which would like to be directly involved (I may include you in the grant proposal and budget). Send me your comments! -------> ...
6
by: dcolford2000 | last post by:
Hi All - I'm new to python Is there an impact analysis tool out there that can cross reference python -- VB has a couple of these tools (eg. Visual Expert) TIA, All comments welcome Dave
10
by: ross | last post by:
I want to do some tricky text file manipulation on many files, but have only a little programming knowledge. What are the ideal languages for the following examples? 1. Starting from a certain...
5
by: Ray Tomes | last post by:
Hi Folks I am an old codger who has much experience with computers in the distant past before all this object oriented stuff. Also I have loads of software in such languages as FORTRAN and...
3
by: Thomas Nelson | last post by:
Sorry if this is a FAQ, but I couldn't find a good summary through google. What kinds of statistical analysis tools exist in python? I really just need t-tests, chi-squared test, and other such...
2
by: jld730 | last post by:
Greetings! I am still new to Python, sorry! I have been searching through many posts on this subject and have attempted to TRY, but I feel really lost. So, any detailed guidance would be oh-so...
7
by: Eric Wertman | last post by:
I have a set of files with this kind of content (it's dumped from WebSphere): ]
0
kmartinenko
by: kmartinenko | last post by:
Hello, I am wondering if there is a sly workaround in ArcGIS 9.2 where I can write a "near" analysis script in Python and create my own tool for the purpose of identifying the distance values...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.