This article at the BBC reports on what appears to be a genetic
algorithm or random search method for finding words that apparently fool
bayesian classifiers every time. http://news.bbc.co.uk/1/hi/technology/3458457.stm
The author apparently had to include html reporting into the emails to
allow his mail client to report back automatically.
Of course if he'd used python the whole process of email generation and
classification could have been done in a single process and would
probably allow easier generation of the magic words.
Why Berkshire, Marriot etc should be allowed through is pretty strange
:)
--
Robin Becker 4 2188 This article at the BBC reports on what appears to be a genetic algorithm or random search method for finding words that apparently fool bayesian classifiers every time.
http://news.bbc.co.uk/1/hi/technology/3458457.stm
I noticed immediately that the author of the article used the term "ham" to
refer to mail which was not spam. Even if SpamBayes dies an ignominious
death in the future at the hands of some ruthless spammers, that will be our
lasting legacy.
Mr. Graham-Cumming could have avoided the overhead of sending himself 10,000
mails by simply selecting words from his archived public presence on the
net: web pages, Usenet posts or archived mailing list posts associated with
his email address. I suspect his genetic algorithm would have been all but
unnecessary. (Google for "John Graham-Cumming" for example.)
This doesn't have to be a tedious process either. In the course of normal
scumbag email harvesting, all the crawler has to do is select a few
non-trivial words from the harvested page and associate them with the email
address(es) on that page. After seeing the same email address a few times
they would have a decent collection of hammy words for use in the "random
words" block of later spam.
Also, unlike the statement the author made:
And, he said, this would have to be repeated for every person a spammer
wanted to reach because they would all have a different list of key
words.
this wouldn't have to be done for all email addresses. Anything which
increases the likelihood that a spam is opened will be seen as an
improvement for the spammer. There's obviously no need for them to get a
100% open rate on spam. If that was the case, they'd already all be out of
business.
These research types. They always do things in the hardest way possible...
Skip
Skip Montanaro <sk**@pobox.com > wrote in message news:<ma******* *************** *************** **@python.org>. .. Mr. Graham-Cumming could have avoided the overhead of sending himself 10,000 mails by simply selecting words from his archived public presence on the net: web pages, Usenet posts or archived mailing list posts associated with his email address. I suspect his genetic algorithm would have been all but unnecessary. (Google for "John Graham-Cumming" for example.)
This doesn't have to be a tedious process either. In the course of normal scumbag email harvesting, all the crawler has to do is select a few non-trivial words from the harvested page and associate them with the email address(es) on that page. After seeing the same email address a few times they would have a decent collection of hammy words for use in the "random words" block of later spam.
Yes, and I've tested this and its possible to find hammy words this
way too, although it wasn't as effective as the technique I pointed
out, nevertheless it is practical and in my experiments I looked at
the uncommon words found in the locus of my email address and around
40% were pure ham!
Another way would be to spider the web page associated with the domain
in the email address. e.g. to attack my address spider www.jgc.org.
All of this indicates that it should be possible to attack Bayesian
filters with a variety of techniques that rely on the fact that they
are naive (i.e. they'll accept a hammy word no matter where it
appears).
John.
In article <ma************ *************** ************@py thon.org>, Tim
Peters <ti*****@comcas t.net> writes
...
..... tomatically.
If I'm a spammer trying to get my pitches seen by you, and you're using a personal Bayesian classifier, then I need to load my pitches with words that are very hammy to you. If I don't have access to your personal training data (if I do, I already own your machine ...), then I need to *deduce* what's hammy to you. One way to do that is, as John Graham-Cumming noted here, is for me to send you thousands of messages with different piles of words, and note which ones did and didn't get caught by your filter. Then I load my sales pitches with words from the ones that your filter didn't reject, and avoid words from ones your filter did reject. In order to do that, I have to know which messages you did and didn't look at. That's the purpose of the HTML "web bug"/"web beacon"s in the thousands of test messages. (If your email client renders HTML pages, including fetching images off the net, a spammer can know when you've rendered their message, by, e.g., embedding your email address as a parameter in a URL that fetches a .jpg to display.)
..... are you asserting that spammers don't have access to the pdf that
users are filtering? Each filter may be unique, but they can be biassed.
--
Robin Becker
In article <ma************ *************** ************@py thon.org>, Tim
Peters <ti*****@comcas t.net> writes [Robin Becker] .... are you asserting that spammers don't have access to the pdf that users are filtering?
Sorry, I couldn't make sense of that question.
Each filter may be unique, but they can be biassed. --
.....OK I guess I'm trying to get at the following hand waving argument.
Since most people agree about what is ham or spam there must be a
general recognizer for each. My question is then, is whether it's
possible to define a camouflage mechanism that turns ham into spam or
vice versa. Most people reading a newspaper article would classify it as
spam. If I insert a short ad v ert into the middle the quick
scan process is gone, but I might be able if everything is
set up correctly to get a forbidden word
set into the text in plain si g ht even
though it's specifically fo r bidden by your
all singing and dancing B a yesian analyser. It is well known
that word/space runs are very distracting which is why printers
have long tried to eliminate them.
I don't believe a small cost will kill all spam; every day I get large
amounts of paper adverts, flyers, business cards etc etc. These have
real cost, but presumably are sufficiently market oriented that they pay
for themselves. Putting a cost on email will just reduce the volume of
spam.
--
Robin Becker This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Perttu Pulkkinen |
last post by:
PHP-community should develop a classification system for content management
tools. This would exremely useful since it is very painstaking to load,
install and try out different applications without even basic knowledge if
they fit my needs or not.
Every module' every essential property in certain system should be listed.
Then I would just need to check the desitred boxes and make a search. As s
result I would get the closest matching...
|
by: sitpost |
last post by:
Only days before the election, a few key ‘battleground-states' esp.
OHIO are barraged with political ads; which are increasingly
"canceling each other out!" Talk about the "fog of politics!"
A close presidential race needs a "tie-breaker." What mediator would
appeal to many say…"die-hard" conservatives?
Why it's none other than Libertarian Party Candidate (OTHER PARTY
candidate in OHIO)—Michael Badnarik www.badnarik.org
|
by: Kumar |
last post by:
Hi all,
can anybody help me.I have to do the classification for the dependence.
As we have give a loop body. It may contain dependence or may not. If
it contains dependence then we do diffeerent transfornation know. Such
as if it is related to scalar we do scalar expansion. I want to draw it
like tree structure or flow diagram. Different transfornation like
strip mining, loop reversal, loop peeling ... and more.
can anyone please me....
|
by: jacob navia |
last post by:
As Richard Bos rightly pointed out, I had left in my classification
of types the C99 types Complex and boolean. Here is a new
classification. Those are not mentioned in the classification
of Plauger and Brody, probably because their work predates
C99. Since there are no examples of this in the literature
(known to me) please take a look.
Thanks
|
by: Grant Robertson |
last post by:
I am interested in including classification info in metadata. I am aware
of the Dublin Core and XMP. However, neither of these appear to specify
exactly how the classification data should be formatted within the
element.
I am interested in any standardized formats for expressing Dewey Decimal
System - DDS, Library of Congress Classification - LCC, Cutter Expansive
Classification, Universal Decimal Classification - UDC, Colon Notation...
| |
by: Basilisk96 |
last post by:
This topic is difficult to describe in one subject sentence...
Has anyone come across the application of the simple statement "if
(object1's attributes meet some conditions) then (set object2's
attributes to certain outcomes)", where "object1" and "object2" are
generic objects, and the "conditions" and "outcomes" are dynamic run-
time inputs? Typically, logic code for any application out there is
hard-coded. I have been working with...
|
by: Evan Klitzke |
last post by:
Hi all,
What frameworks are there available for doing pattern classification?
I'm generally interested in the problem of mapping some sort of input
to one or more categories. For example, I want to be able to solve
problems like taking text and applying one or more tags to it like
"romance", "horror", "poetry", etc. This isn't really my research
specialty, but my understanding is that Bayesian classifiers are
generally used for problems...
|
by: alex lee |
last post by:
im using a backpropagation classification C code which is written by C. K Mohan on 1997. i was wondering how was the input file for the coding. i would to do a classification experimental by using backpropagation. Following is the link of the coding.
www.cis.syr.edu/~mohan/html/Bookfiles/ckm_bp.c
if got other C source code on the backpropagation, please feel free to suggest.
|
by: Lars Uffmann |
last post by:
Is it just me or was that about the worst flooding in months? I'm
seriously thinking the severe punishment for spammers should be made a
prerequisite for countries wanting to join the WTO....
Damn annoying...
Nm & Best Regards,
Lars
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
| |
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
| |
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |