By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,772 Members | 906 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,772 IT Pros & Developers. It's quick & easy.

Spambayes modifications with web services

P: n/a
In the last few months many personal website owners (such as myself)
have found that spammers have been using their domain names to
masquerade as valid users to send spam, normally in the form of:

Jo**********@mydomain.com

This new tactic has an annoying problem, which is that the bounced
emails end up back with the postmaster at the innocent persons domain.
This is normally the first time that the domain owner realises that
there is a problem.

I am one of those people and currently have nearly 3 thousand bounces
in my catch all POP3 box.

Solutions I can see to this are one of two things:

1) Delete the email as it arrives and ignore it. Realise that the
domain name might end up being blacklisted as a spammer's domain and be
done with it, or

2) Fight back! All of the bounced emails contain at least one URI to a
spammer website, in a effort to sell "Cheap Meds" or "Faked Rolexes" or
similar. The format is usually something like this:

http://www.sickmate.info/?a2fb9e415e...ee919d78Sa6a7d

The query part of the URI I believe provides the reference between the
email address and the visit. Hence if you visit the website with this
link, your email address is saved in a database as one that is a)
valid, and b) dumb enough to visit the website.

The spammers rely on the fact that some people will visit this website
and buy from them. In fact, Q.E.D., some people must buy from these
websites via spam, otherwise the spammers would have given up a long
time ago*.

So, as a web programmer and someone who specialises in getting good
results on Google, I realised that I could simply post every spammer
website on a Google optimized page, which if searched for on Google
would return something like:

"WARNING: DO NOT BUY FROM THIS WEBSITE. THE SPAMMER IS A RUSSIAN MAFIA
CROOK WHO WILL STEAL YOUR MONEY."

....Or something equally obvious along those lines. In this way we
attack the websites that are the link between the spam and the money.
The real necessity therefore is to:

a) Process the received bounced messages quickly and list them on the
website without delay.
b) Prevent the spammer using the domain

The answer to (b) I cannot find. I thought SPF might help, but it is
not a panacea. The answer to (a) I need help with!

So, I'm on Windows XP. I use Outlook 2002 and I already have the
excellent (and FREE) SpamBayes Outlook add-in** that blocks spam and
loves ham. Spambayes is open source and as such I can modify the source
code, recompile it and install it afresh. However, the problem is that
I'm not a python programmer, and I'm not sure where to start. This is
what I want to do, so if anyone would like to direct me, I'd be
grateful:

1) Add a menu option to the SpamBayes add-in - "Post Spam Site to Web
Service". I'm guessing I can add a new line to the addin.py such as
below, but how do I sink the event?

self._AddControl(popup,
constants.msoControlButton,
ButtonEvent, (PostSpamSite, self.manager,),
Caption="Post Spam Site to Web Service",
Enabled=True,
Visible=True,
Tag = "SpamBayesCommand.PostSpam")

2) Add a configuration setting, so that the web service location can be
set. I'm guessing this is in config.py. Pointers welcome.

4) Add a function to extract all links in a block of text. I have
written a good one of these for .NET, but I'm not sure if, or how it
would work in Python:

string hrefPattern =
@"(?<all>(?:(?<protocol>http(?:s?)|ftp)(?:\:\/\/))"
+ @"(?<domain>[^/\r\n\:]+)?"
+ @"(?<port>\:\d+)?"
+ @"(?<path>[^\?#]*)?"
+ @"(?<qrystr>\?\w*)?"
+ @"(?<bookmark>\#\w*)?)";

// Regular Expression
Regex hrefRegex = new Regex(hrefPattern, RegexOptions.Singleline |
RegexOptions.IgnorePatternWhitespace | RegexOptions.IgnoreCase);

Any help with this welcome. Do I need a specific Python regex library
or can I use the .NET regex library in Python?

4) Connect to web service using SOAP and consume that service. Service
will provide:

a) Authorise (username, password) - returns access
b) Submit (domain) - returns success or failure

Can I use SOAPpy for this? Can anyone give me any examples or point me
in the right direction?

5) Provide another option in the add in to "Scan folder and Post Spam
Sites to Web Service", in the same manner as "Filter messages" works
now. Can I use filter.py as a model to work from?

Summary
=================================
I am not a Python programmer per se but have no problem with getting my
hands dirty. I have already got the basics of this working as a
Windows.Forms application, but running both that and Outlook together
is daft. The Spambayes project already does the hard bit in classifying
the spam, so it makes sense to hang off the back of it.

Has anyone else had similar problems as me with these "phantom" email
addresses being using by spammers and would like to work with me on
this? Would anyone in the Spambayes team like to have a go at this, or
point me in the right direction? Has anyone had a go at hacking around
with the SpamBayes source code and knows what I should do?

Basically any help is extremely welcome!

Regards

Ben

* There must obviously be enough people out there who can't get an
erection or dumb enough to munch pills to get slim rather than endure a
bit of excercise. That being said, they will also trust their credit
card to a bunch of crooks who even if they send you the pills, will
probably sent you rat poison!

** Get the FREE Spambayes Outlook add-in from
http://sourceforge.net/projects/spambayes

Oct 28 '05 #1
Share this Question
Share on Google+
5 Replies


P: n/a
benmorganpowell:
So, as a web programmer and someone who specialises in getting good
results on Google, I realised that I could simply post every spammer
website on a Google optimized page, which if searched for on Google
would return something like:

"WARNING: DO NOT BUY FROM THIS WEBSITE. THE SPAMMER IS A RUSSIAN MAFIA
CROOK WHO WILL STEAL YOUR MONEY."


Spam may also contain the addresses of non-spamming businesses,
sometimes in an effort to increase the apparent legitimacy of the spam.
Attacking all of the web sites in spam will cost these businesses while
having little effect on spammers who use temporary domains which may be
cheaply abandoned. It also gives another attack vector for those
criminals that attempt to extort money from web sites by threatening to
damage them.

If you want to take action against spammers, first think through all
the potential consequences of your actions.

Neil
Oct 28 '05 #2

P: n/a
be*************@gmail.com wrote:
In the last few months many personal website owners (such as myself)
have found that spammers have been using their domain names to
masquerade as valid users to send spam, normally in the form of: <snip> So, as a web programmer and someone who specialises in getting good
results on Google, I realised that I could simply post every spammer
website on a Google optimized page, which if searched for on Google
would return something like:

"WARNING: DO NOT BUY FROM THIS WEBSITE. THE SPAMMER IS A RUSSIAN MAFIA
CROOK WHO WILL STEAL YOUR MONEY."

<snip>

So basically a DoS attack could now be simply performed by crafting a
spam message and adding the url to your target and then sending it out
to as many users you can think of ? (DoS not in the typical form, but
the effect would be just as real, deny them of legitimate customers)

Nice plan sherlock.

--
Lasse Vågsæther Karlsen
http://usinglvkblog.blogspot.com/
mailto:la***@vkarlsen.no
PGP KeyID: 0x2A42A1C2
Oct 28 '05 #3

P: n/a
Thank you for the flippant remarks. Let's just say that I found them to
be unproductive.

I would like to point out the process was not designed to be automatic
and I don't believe made such a statement. I should clarify that my
desire was to list each domain that was contained in a spam email, so
that the user could then:

- check if previously it has been reported as spam, or
- open the link in their browser, and
- check whether the domain was spam or ham, and then if spam
- post it to the web service ("Post Spam Site to Web Service").

Therefore, thanks, yes, I did "think through the consequences of my
actions".

The line that reads "WARNING: DO NOT BUY FROM THIS WEBSITE. THE SPAMMER
IS.....", was tongue in cheek, and it seems to be the line that stirred
up the condescending comments. What I should have written was something
more along the lines of:

"WARNING: The website <domain> has been reported by <x> users as a
website that uses illegal spam email to generate business leads"

I think that is a perfectly useful and fair statement, which I cannot
see damaging legitimate business enterprises.

I also think it would be quite useful for consumers to know that the
domain name they are about to purchase had previously been misused by
spammers, and was quite likely to be blacklisted by spam software.

I must say that I am surprised that the python group could be so
unfriendly and unhelpful.

Many thanks.

Nov 2 '05 #4

P: n/a
be*************@gmail.com wrote:
Thank you for the flippant remarks. Let's just say that I found them to
be unproductive.

I would like to point out the process was not designed to be automatic
and I don't believe made such a statement. I should clarify that my
desire was to list each domain that was contained in a spam email, so
that the user could then:

- check if previously it has been reported as spam, or
- open the link in their browser, and
- check whether the domain was spam or ham, and then if spam
- post it to the web service ("Post Spam Site to Web Service").

Therefore, thanks, yes, I did "think through the consequences of my
actions".

The line that reads "WARNING: DO NOT BUY FROM THIS WEBSITE. THE SPAMMER
IS.....", was tongue in cheek, and it seems to be the line that stirred
up the condescending comments. What I should have written was something
more along the lines of:

"WARNING: The website <domain> has been reported by <x> users as a
website that uses illegal spam email to generate business leads"

I think that is a perfectly useful and fair statement, which I cannot
see damaging legitimate business enterprises.

I also think it would be quite useful for consumers to know that the
domain name they are about to purchase had previously been misused by
spammers, and was quite likely to be blacklisted by spam software.

I must say that I am surprised that the python group could be so
unfriendly and unhelpful.

Many thanks.

Personally I didn't regard the reply as unhelpful, and I believe the
replier was honestly trying to get you to see that your rather naive
suggestion was most unlikely to make things better.

In essence your proposal appears to be that we clog up the search
services with long-lived references to pages in your sites that refer to
the spammers' (short-lived) sites in derogatory terms. If you managed to
get enough people to be so unwise then Google and the other search
engines would be useless within a month, and the spammers' sites would
dominate their content.

You say you specialize in "web programming and getting good results on
Google", but you don't seem to know that much about how the spammers
operate and how the web really works. Generally speaking any attempt to
"fight back" by generating more traffic and more search engine entries
will only make things worse. They certainly *won't* go away if you don't
ignore them, and they *probably* won't go away even if you do. Ergo
such "retaliation" is a waste of time.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/

Nov 2 '05 #5

P: n/a
On 02/11/05, Steve Holden <st***@holdenweb.com> wrote:
Personally I didn't regard the reply as unhelpful, and I believe the
replier was honestly trying to get you to see that your rather naive
suggestion was most unlikely to make things better.

To the OP

A tip for curing your own problem - remove your catchall. Only allow
incoming email and bounces to valid in-use addresses - not to an
infinite number of unused addresses. Repeat after me "Catchalls are
lazy !!! " :)

Catchalls are a spammer's best friend and contribute greatly to the
internet's spam problem AND to the increasingly harsh attempts by ISPs
and other mail providers to reduce their load and spam throughput.
Not to mention the slow lingering death of services that provide a
catchall that can be forwarded.

You used a real domain name in your original post !! Anyone
marking your post as spam using your criteria would have blacklisted
that unfortunate domain.

HTH :)
Nov 2 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.