473,394 Members | 1,750 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

Simple Bayesian classifier?

Hi all,

I am trying to build an application to classify texts from a number of
sources. I am programming it in PHP and I go "by the book" - i.e.
calculating probabilities according to the formula etc.
It works, but it's very slow (due to slow PHP mathematical
implementation, I guess).
Is there some variation of the Naive Bayes classifier which is not so
demanding in the way of computing power used?

Best
Pavel
Jun 8 '07 #1
4 3892
On Jun 8, 11:52 am, Pavel Kalinov <pavk...@gmail.comwrote:
Hi all,

I am trying to build an application to classify texts from a number of
sources. I am programming it in PHP and I go "by the book" - i.e.
calculating probabilities according to the formula etc.
It works, but it's very slow (due to slow PHP mathematical
implementation, I guess).
Is there some variation of the Naive Bayes classifier which is not so
demanding in the way of computing power used?

Best
Pavel
spamassasin's code is OS, have you checked that out?
http://svn.apache.org/viewvc/spamass...pm?view=markup
AFAIK php offloads its maths to c libraries; so your problem is that
it can be much more computationally intensive to work by the book,
with no code optimisation techniques etc... (hash tables and so on).
(A mathematician C programmer I know got their code to run in 2 days
rather than 2 weeks after some optimisation)

Jun 8 '07 #2
At Fri, 08 Jun 2007 20:52:39 +1000, Pavel Kalinov let h(is|er) monkeys
type:
Hi all,

I am trying to build an application to classify texts from a number of
sources. I am programming it in PHP and I go "by the book" - i.e.
calculating probabilities according to the formula etc.
It works, but it's very slow (due to slow PHP mathematical
implementation, I guess).
Is there some variation of the Naive Bayes classifier which is not so
demanding in the way of computing power used?

Best
Pavel
You may like http://xhtml.net/php/PHPNaiveBayesianFilter
I am a bit surprised you have such a slow response, the typical algorithms
don't seem to be extremely taxing.

As part of an author authenticity scoring app Naive Bayesian filtering
proved quite useful, for spam filtering its use *by itself) proves rather
limited. Quite a few spam creators (scripts) are well equipped these days
to lower scores substantially, allowing their messages to leak through.

hth

--
Schraalhans Keukenmeester - sc*********@the.Spamtrapexample.nl
[Remove the lowercase part of Spamtrap to send me a message]

"strcmp('apples','oranges') < 0"

Jun 8 '07 #3
Thanks, I didn't know this - will look into it.
BTW, I am not trying to make a spam filter, but to sort news articles in
a number of categories (16 at present, as test). And I need
milliseconds, not days :-(

Best
Pavel

shimmyshack wrote:
On Jun 8, 11:52 am, Pavel Kalinov <pavk...@gmail.comwrote:
>Hi all,

I am trying to build an application to classify texts from a number of
sources. I am programming it in PHP and I go "by the book" - i.e.
calculating probabilities according to the formula etc.
It works, but it's very slow (due to slow PHP mathematical
implementation, I guess).
Is there some variation of the Naive Bayes classifier which is not so
demanding in the way of computing power used?

Best
Pavel

spamassasin's code is OS, have you checked that out?
http://svn.apache.org/viewvc/spamass...pm?view=markup
AFAIK php offloads its maths to c libraries; so your problem is that
it can be much more computationally intensive to work by the book,
with no code optimisation techniques etc... (hash tables and so on).
(A mathematician C programmer I know got their code to run in 2 days
rather than 2 weeks after some optimisation)
Jun 11 '07 #4
Pavel Kalinov wrote:
BTW, I am not trying to make a spam filter, but to sort news articles in
a number of categories (16 at present, as test). And I need
milliseconds, not days :-(
Still, SpamAssassin might be what you're looking for.

Turn off all SA's non-Bayes scoring, and then feed SA a corpus of say, 500
sports articles, telling it that they're "spam"; then 500 non-sports
articles, telling them they're "ham". After this preparation, your SA
configuration should be primed to detect sports articles.

Another 15 SA configurations, and your setup should be complete.

With SA, one user can have multiple configurations using the "--configpath"
command-line option.

--
Toby A Inkster BSc (Hons) ARCS
[Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
[OS: Linux 2.6.12-12mdksmp, up 108 days, 16 min.]

URLs in demiblog
http://tobyinkster.co.uk/blog/2007/05/31/demiblog-urls/
Jun 11 '07 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: delisonews | last post by:
I'm looking for a simple, filesystem-based message board. (No MySQL!) Something that I could include easily in my code: include '../inc/messageboard.php'; .... so that the board shows up at...
6
by: Robin Becker | last post by:
Hi, I'm getting vast numbers of fake upgrade emails containing some kind of virus. My rather old client can be made to reject these based on some patterns in the subject line. They're nearly all...
1
by: penny336 | last post by:
hi... anyone can share the implementation of bayes classifier and mle in c++ code... i am doing a pattern recognition project.. has done the feature extraction part. but really no idea in doing...
0
by: mlavespere | last post by:
Take a look at www.feedbeagle.com. You create some categories and assign some stories from RSS feeds to them. FeedBeagle can then automatically assign new stories to your categories based on a...
2
by: Ross A. Finlayson | last post by:
Hi, I wonder if there is a simple library or set of functions for basic statistical functions, like what may be on a pocket calculator, without getting into linking into the language runtimes of...
7
by: abcd | last post by:
I am trying to set up client machine and investigatging which .net components are missing to run aspx page. I have a simple aspx page which just has "hello world" printed.... When I request...
14
by: Giancarlo Berenz | last post by:
Hi: Recently i write this code: class Simple { private: int value; public: int GiveMeARandom(void);
3
by: smitanaik | last post by:
how to implement bayesian classification
4
by: Dinakara | last post by:
hi, I want to if any open source network IDS based on bayesian classifiers/network is available or if any of you have given a try... , I want some pointers on how to go about it.... Thanks in...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.