473,890 Members | 1,355 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

[warning: 99% OT] does anything like this exist ?...


First off, sorry for this message-in-a-bottle-like post... I haven't been
able to phrase my questions well enough to get a meaningful answer from
Google in my research. OTOH, it is standard flattery (but true) that this
group has a bunch of the nicest and most knowledgeable Usenet people
around, and I know for a fact that there are some pretty good spam-
related tools written in Python, so I thought I might get away with it
:-)

Yes, it's about spam.

I have a (very old, POP3) email address that's flooded with spam (about
500 msgs per 24 hrs, no thanks to the ISP...) but that I still need to
use for various reasons. Of course it's unmanageable without some sort of
bayesian spam filter.

When I work from home or office or my laptop, tools like Thunderbird or
Pegasus+K9 do the job adequately. But I also frequently need to access it
"on the road" with whatever comes in handy (Webmail etc.), and that
becomes a problem if I've been away for even a short period.

One possible solution I've been mulling over, and looking for, would be
some sort of selective, on-line filter/downloader. I have an always-on,
Linux box at home on a decent DSL line ; on this I could have a daemon to
frequently poll that POP3 address, pulling only the headers (or maybe the
first body KB or so) and running those through a bayesian filter. If a
message comes out "clean" it is left on the server (so I can access it
from wherever I am), if it doesn't it is downloaded to a local mbox and
removed from the server (like getmail.py does) so I can still dig out the
occasional false positive when I'm home...

Is there a ready-made tool that fills this need ? If it's in Python so
much the better, but actually I'll run anything else within my means
(even perl :-) if it works without having to tinker the code... Also,
it's possible that my single-minded approach is misled and there's a
better way to achieve that goal, so I'm ready to change my mind too :)

TIA - again, sorry for the noise,
fp

--
YAFAP : http://www.multimania.com/fredp/
Jul 18 '05 #1
12 2329
On Tue, 2004-10-26 at 13:40 +0000, Fred Pacquier wrote:
I have a (very old, POP3) email address that's flooded with spam (about
500 msgs per 24 hrs, no thanks to the ISP...) but that I still need to
use for various reasons. Of course it's unmanageable without some sort of
bayesian spam filter. One possible solution I've been mulling over, and looking for, would be
some sort of selective, on-line filter/downloader. I have an always-on,
Linux box at home on a decent DSL line ; on this I could have a daemon to
frequently poll that POP3 address, pulling only the headers (or maybe the
first body KB or so) and running those through a bayesian filter. If a
message comes out "clean" it is left on the server (so I can access it
from wherever I am), if it doesn't it is downloaded to a local mbox and
removed from the server (like getmail.py does) so I can still dig out the
occasional false positive when I'm home...


Rather that doing that, you might consider simply setting up a local
IMAP server on your Linux box. Have a program (such as fetchmail) pull
down your POP3 email, filter it using procmail and feed it into the IMAP
server so that you can then access it from anywhere.

--
Cliff Wells <cl************ @comcast.net>

Jul 18 '05 #2
Cliff Wells <cl************ @comcast.net> said :
Rather that doing that, you might consider simply setting up a local
IMAP server on your Linux box. Have a program (such as fetchmail)
pull down your POP3 email, filter it using procmail and feed it into
the IMAP server so that you can then access it from anywhere.


Thanks, Cliff. I realize this would be the "standard" way of going about
such things (didn't want to bloat the original post :-). However I tend to
think of it as a fallback solution if nothing else comes up, if only for
the following reasons :

* a lot of upfront work : several major packages (fetchmail, procmail,
imapd) I have no hands-on experience with, a lot of reading/learning and a
lot of little things to get just right in each... In my mind the solution
I'm after could be a much simpler affair, but then maybe I'm daydreaming :)

* one more potential security hole on my home machine : I tend to limit
open ports on that one to a bare minimum, my admin skills & available time
being quite limited...

* availability : my ISP doesn't filter spam but otherwise does a good job
of keeping its server up. Depending on my own setup means I could get shut
out when away from home because of a power failure, DSL downtime, a full
disk, or any of a myriad other domestic problems...

I've been paged about so many similar setups in the workplace that I'll
avoid them at home if I can... But then, sometimes you _do_ have to turn to
plan B :-)

--
YAFAP : http://www.multimania.com/fredp/
Jul 18 '05 #3

[Fred]
I could have a daemon to
frequently poll that POP3 address, pulling only the headers (or maybe the
first body KB or so) and running those through a bayesian filter. If a
message comes out "clean" it is left on the server (so I can access it
from wherever I am), if it doesn't it is downloaded to a local mbox and
removed from the server (like getmail.py does) so I can still dig out the
occasional false positive when I'm home...


Andrew Dalke contributed a program to the SpamBayes wiki that does
exactly this:

http://www.entrian.com/sbwiki/SpamBayesCuller
http://www.entrian.com/sbwiki/Recent...pamBayesCuller

"This program, sb_culler, uses SpamBayes to run a POP3 email culler. It
connects to my email servers every few minutes, downloads the emails,
classifies each one, and deletes the spam and viruses. (It makes a
local copy of the spam, just in case.)"

You'll need to download and install the SpamBayes source code as well,
from:
http://sourceforge.net/project/showf...ckage_id=58141

Take either the .tgz or the .zip, not the .exe (which installs a binary
application rather than the source).

--
Richie Hindle
ri****@entrian. com

Jul 18 '05 #4
Fred Pacquier wrote:

Is there a ready-made tool that fills this need ?


I don't know any, but writing such is not a big problem - python has builtin
support for pop3 and mail processing (examples are in its documentation),
so just download mail, pipe it through spamassassin and eventually delete
from server.

--
Maciej "Fiedzia" Dziardziel (fiedzia (at) fiedzia (dot) prv (dot) pl)
www.fiedzia.prv.pl

My drinking team has a soccer problem.
Jul 18 '05 #5
Richie Hindle wrote:
Andrew Dalke contributed a program to the SpamBayes wiki that does
exactly this:

http://www.entrian.com/sbwiki/SpamBayesCuller
http://www.entrian.com/sbwiki/Recent...pamBayesCuller You'll need to download and install the SpamBayes source code as well,


I got email about 6 months ago asking if it is okay to include my
sb_culler program with the SpamBayes source code. I haven't checked
to see if that actually happened.

As written it's designed for a programmer to use. You had to
edit code to change options.

I've added a few new features since then, like reloading the
good_emails file if there are changes. Want to add another to
whitelist based on domain, and to remove emails with a given
subject, like "Sprava nebola dorucena". Perhaps I'll update
the wiki afterwards.

Andrew
da***@dalkescie ntific.com
Jul 18 '05 #6
Richie Hindle <ri****@entrian .com> said :
Andrew Dalke contributed a program to the SpamBayes wiki that does
exactly this:

"This program, sb_culler, uses SpamBayes to run a POP3 email culler.
It connects to my email servers every few minutes, downloads the emails,
classifies each one, and deletes the spam and viruses. (It makes a
local copy of the spam, just in case.)"


Wo-ow ! Talk about being spoiled... :-)
Of course I suspected I was not alone in wanting to do this, and that
someone had already done it. But to see it described so exactly (only
better and in fewer words), and done in python too, and using Spambayes, no
less... well, I hadn't dared expect that much. Sort of makes me regret not
asking up front for that flashy pink finish and the leather upholstery, too
:)

Actually I should feel ashamed : such a happy ending will undoubtedly
unleash a tsunami of rabid OT posters on this last peaceful corner of a
beleaguered Usenet, which will subsequently collapse in the next hundred
days, taking down with it the Internet and civilization as we know it. I'm
really sorry folks...

Meanwhile, thanks a lot for the pointer Richie !

--
YAFAP : http://www.multimania.com/fredp/
Jul 18 '05 #7
Andrew Dalke <ad****@mindspr ing.com> said :
I got email about 6 months ago asking if it is okay to include my
sb_culler program with the SpamBayes source code. I haven't checked
to see if that actually happened.
I just did : it's been added to CVS on June 11, same code as on the Wiki,
but apparently not updated since, and not included in the contribs for
the 1.0 release.

I already have a couple of naive questions :

* is the documented change on line 348 enough to run the script with a
current version of Spambayes ?

* does use of sb_culler contribute to the training of the Spambayes db,
or does it assume that it is kept current independently (by means of
normal use by a mail client through the POP3 proxy for instance) ?
As written it's designed for a programmer to use. You had to
edit code to change options.
That's OK with me, as long as the code is in python :)
I've added a few new features since then, like reloading the
good_emails file if there are changes. Want to add another to
whitelist based on domain, and to remove emails with a given
subject, like "Sprava nebola dorucena". Perhaps I'll update
the wiki afterwards.


Well, if any of these new additions are available for outside use, I'd
sure appreciate a copy of the updated script...

TIA,
fp

--
YAFAP : http://www.multimania.com/fredp/
Jul 18 '05 #8
Maciej Dziardziel <fi*****@fiedzi a.prv.pl> said :
Is there a ready-made tool that fills this need ?


I don't know any, but writing such is not a big problem - python has
builtin support for pop3 and mail processing (examples are in its
documentation), so just download mail, pipe it through spamassassin
and eventually delete from server.


I sort of knew this one would be forthcoming :)

I won't say I haven't been tempted - like everyone else here I've written
my own 'experimental' POP3 checker using the python standard lib, and
several more from the 'yet another' category... but I know my limits.

Also, over the years I've had my share of "unique/original" (heh) ideas,
spending ages to implement them, only to find out afterwards that someone
had already done it, and generally much better... just like Andrew Dalke
with sb_culler in this case.

I'm not looking for a pet project here, just to solve a problem. If I can
do it in a few hours of tinkering with a ready-made answer I certainly
won't try to roll my own. If I were a student, or in that mythical state
between retirement and brain-rot, maybe, but for now it's definitely Plan C
(or D :-)

--
YAFAP : http://www.multimania.com/fredp/
Jul 18 '05 #9
On Tue, 26 Oct 2004, Cliff Wells wrote:
Rather that doing that, you might consider simply setting up a local
IMAP server on your Linux box. Have a program (such as fetchmail) pull
down your POP3 email, filter it using procmail and feed it into the IMAP
server so that you can then access it from anywhere.


I've been using Charles Cazabon's getmail (pure Python) for quite some
time in place of fetchmail.

-------------------------------------------------------------------------
Andrew I MacIntyre "These thoughts are mine alone..."
E-mail: an*****@bullsey e.apana.org.au (pref) | Snail: PO Box 370
an*****@pcug.or g.au (alt) | Belconnen ACT 2616
Web: http://www.andymac.org/ | Australia
Jul 18 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
2676
by: Dirk Försterling | last post by:
Hi all, a few days ago, I upgraded from PostgreSQL 7.2.1 to 7.4, following the instructions in the INSTALL file, including dump and restore. All this worked fine without any error (message). Since then, I found lots of the following in the postmaster output: 2003-11-29 15:19:54 ERROR: large object 4838779 does not exist 2003-11-29 15:20:11 ERROR: large object 4838779 does not exist
43
2777
by: Anitha | last post by:
Hi I observed something while coding the other day: if I declare a character array as char s, and try to use it as any other character array..it works perfectly fine most of the times. It holds strings of any length. I guess what is happening here is that this array initially holds only '\0' and hence is of length 1. But sometimes, when I tried to write some functions and do some
29
2537
by: junky_fellow | last post by:
Consider the following piece of code: struct junk { int i_val; int i_val1; char c_val; }; int main(void) {
40
7932
by: Dave Hansen | last post by:
Please note crosspost. Often when writing code requiring function pointers, it is necessary to write functions that ignore their formal parameters. For example, a state machine function might take a status input, but a certain error-handling state might ignore it: typedef void (*State_Fn)(uint8_t); void error_state(uint8_t status)
7
16828
by: Olaf Baeyens | last post by:
I am testing VC++ 2005 and I get this warning: "warning C4996: 'strncpy' was declared deprecated" Does that mean that they might be phased out in VC++ 2006 or higher? Or does an alternative function exist as replacement?
1
5817
by: John Harris | last post by:
We have some C++ code that has a makefile which contains both /W2 and /W3. Due to the way the makefiles are written to be shared across multiple projects, it's not trival to eliminate the duplicate compiler warning-level directives. I'm trying to find a way to silence the following warning displayed by Visual Studio 2005 for each file compiled: cl : Command line warning D9025 : overriding '/W2' with '/W3' One would think you could add...
2
2763
by: dasilva109 | last post by:
Hi guys I am new to C++ and need urgent help with this part of my code for a uni coursework I have to submit by Thursday //ClientData.h #ifndef CLIENTDATA_H #define CLIENTDATA_H #include <string>
3
4583
by: Tom Baxter | last post by:
I just set up VS 2008 B2 and did a simple command line compile from the VS command prompt. I received this warning: warning CS1668: Invalid search path 'C:\Program Files\Microsoft SDKs\Windows\v6.0A\lib' specified in 'LIB environment variable' -- 'The system cannot find the path specified. ' When I checked, sure enough, the "lib" directory is not present'. Perhaps I
11
4774
by: chemlight | last post by:
I'm having a problem. I'm sure I'm going to kick myself over the answer... I have a table that stores vendors and their languages. This table starts out blank. I am querying the table to see if a vendor has been added to the table yet. The problem is, if they haven't been added, I can't seem to get the script to realize that. here is what I am trying to do. $testvend = SELECT language FROM vendor_details WHERE id = $vendorid ...
0
9978
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
11222
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10811
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
10461
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9625
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
8015
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6041
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4674
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
4270
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.