473,386 Members | 1,752 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Information, code, or reading about machine-based textual analysis and classification?

I'm hoping someone here can point the way toward a fairly specialized topic.
I have large amounts of content that need to be classified. Irrelevant or
"uninteresting" (by our criterion) articles need to be disposed of.
"Interesting" articles should be tagged with a number of points of metadata.

As much as possible, I would like machines to do this work. Fairly dumb
ways of doing this might include using our existing databases/metadata as
keyword collections and classifying based on brute-force scans and matches
against our content (e.g. strings "Department of Laboratory Medicine",
"Dr. James Fine." Smarter ways may have been suggested by some of the
presentations at the recent "Google Developer Day" in Mountain View:
programmatic analysis of seed texts led to mechanisms of analysis that
seemed much more efficient than raw text scanning.

I am speculating based on no real knowledge but I would imagine it would be
possible to develop some kind of "relevance index" for an item as compared
to an existing body of text, and keep or dump based on a threshold. More
interestingly, maybe I have classified a thousand articles as say, "UW
biomedical research," and there is an algorithmic means by which we could
assess the "UW biomedical research-ness" of an unknown text. That would be
very useful.

Are there resources or readings I can be looking at? Are there any
pre-existing libraries or frameworks or tools that could ease this task? The
content lives in MS SQL Server 2005 or can be placed in it; Index Server is
installed on my servers; maybe there are things within these tools that can
help.

Thanks in advance for any leads you can offer.

-KF
Jun 20 '07 #1
0 977

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: MJL | last post by:
I am working with an open source piece of software that is overwhelming me and I was hoping to get some advice. It is in some areas simple and others nearly impossible. There are many includes...
1
by: Omer Ahmad | last post by:
Hi All, I've been working with python for about 6 months now, and have been very impressed with the size and scope of the libraries. I have, however, run into a bit of a problem. I...
0
by: Bengt Richter | last post by:
I just read Guido's http://www.artima.com/weblogs/viewpost.jsp?thread=86641 "Adding Optional Static Typing to Python -- Part II" and I it struck me that if you step back to a more distant...
3
by: Gianni Mariani | last post by:
This is one of those, hugh ? moments. So, GCC behaves just like I would kinda expect it to but it looks VERY strange. It's one of those things that could cause silent strife if you included...
12
by: Oliver Knoll | last post by:
Ok, I've searched this group for Big/Little endian issues, don't kill me, I know endianess issues have been discussed a 1000 times. But my question is a bit different: I've seen the follwing...
25
by: Alvin Bruney | last post by:
C# is great but it does have some short comings. Here, I examine one of them which I definitely think is a shortcoming. Coming from C++, there seems to be no equivalent in C# to separate code...
8
by: garyusenet | last post by:
Program is witten in C++ and runs on my windows xp computer. It is a game, but I have been struggling with this project for weeks now so would appreciate some help - although I understand what i'm...
1
by: Victor | last post by:
Hi Guys. I have a question about which is the best way to store user info across the whole website. Now I have 3 web servers and each server has enabled the web garden (6 wps). now I want to save...
19
by: Hapa | last post by:
Does only reading (never writing) of a variable need thread synchronisation? Thanks for help? PS. Anybody knows a Visual C++ news group?
4
by: crazyhouse | last post by:
I am using a custom fuction (I got the information from Microsofts site for the code) Function Median (tName As String, fldName As String) As Single Dim MedianDB As DAO.Database Dim...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.