Cloaking Email Address

Steevo

Any suggestions as to the best programs for cloaking email addresses?
Many thanks

--
Steevo

Jul 24 '05

Subscribe Post Reply

117

11706

Dave Anderson

Philip Ronan wrote:

"Dave Anderson" wrote:
Philip Ronan wrote:
"Dave Anderson" wrote:

Philip Ronan wrote:

></SCRIPT><NOSCRIPT>mail [at] example [dot] com<NOSCRIPT>

Do you *really* believe that it's any harder to detect and process this
(and its obvious variants) than it is to process an entity-encoded email
address? This is the equivalent of installing a solid-steel front door
with a dozen deadbolts while leaving your back door wide open!

Yes, *really*

Take a look at <http://www.google.com/search?q=%22at+*+dot%22>, for example.
Plenty of false hits there.
That would matter if spammers cared about false hits, but all the
evidence I've seen says that they don't. Given that they're usually
stealing someone else's resources to send their crap, why should they?
AFAICT they'd be happy with a scheme which harvests as little as 10%
real addresses, which should be pretty easy to accomplish.

You seem to have conveniently forgotten that the words "at" and "dot" crop
up quite frequently in the English language. OTOH, email addresses obey a
strict syntax that means they can be extracted very easily and very
reliably.

No, you're ignoring the fact that, except for obfuscated email
addresses, they don't occur all that often in the pattern <token>
<optional-open-bracket> "at" <matching-optional-close-bracket> <token>
(<optional-open-bracket> "dot" <matching-optional-close-bracket> <token>)+
[where '+' indicates "repeated one or more times"]. Harvesting only
cases which match that pattern is trivial, and I'd expect it to produce
a tolerable level of false positives (I'm not about to loose a spider on
the web to check).
Furthermore, if a spammer comes across an email address encoded using HTML
entities, then he can be more or less *certain* that it's a real address.
That's a significant advantage given the existence of pages like this:
<http://ktmatu.com/cgi-bin/rea.pl> and this:
<http://www.hostedscripts.com/scripts/antispam.html>
If spammers start using tools which harvest entity-encoded addresses,
pages like those will adapt and start producing entity-encoded fake
addresses. Advantage nullified.
I also think you're wring in asserting that spammers are happy with a 10%
hit rate. If they're sending out 10 emails per second from a hijacked
server, of which only one is a valid address, then it would take about 2
weeks to send 1 million emails to real recipients. It's very likely that the
hijacked server would be detected and taken offline long before this happens
"Happy" is probably an exaggeration, but it's clear (e.g., from the
dictionary attacks that they use) that many of them are willing to try
quite a few addresses in order to find a few working ones.
Why don't you try writing some code to extract email addresses written using
'at/dot" instead of "@/." (and the "obvious" variants)? I think you'll find
this is not as easy as you say it is.
Regular-expression matching is extremely well understood, and there are
plenty of tools around which implement it. Transforming the pattern
above into a proper regular expression is simple. Where's the problem?
Or could you at least look for an email harvesting application that does
this already? I don't think you'll have much luck.

Let's make sure I understand what you're saying. The fact that we don't
know of any harvesting application which extracts "... at ... dot ..."
obfuscated email addresses means that they are safe. But entity-encoded
email addresses are unsafe even though we don't know of any harvesting
application which extracts them. You do see the contradiction, I hope.

Dave

Jul 24 '05 #51

Philip Ronan

"Tim" wrote:

... We really have got
better things to do than turn two minutes of effort in contacting someone
into ten minutes of being mucked about, whatever the reason.
OK, but we're not talking about *minutes* here, are we? Just a few extra
*seconds*, and that only applies to about 1% of the people using my contact
page. The rest can get by with a single click. That's actually making things
much easier than on, e.g., Stan Brown's website where *everybody* has to
copy & paste the email address into a mail client (assuming they have one).
... intelligent suggestions.

1. Make it easy for customers to contact you with a real e-mail address.
I'm doing that already.

.... Oh, wait, you mean an *RFC822-formatted* email address? But if the
spambots get hold of it, I'll have a lot of spam to deal with. So I'll have
to spend more time weeding out the spam from my inbox, and genuine
correspondents are liable to have their emails bounced or swallowed by the
spam filter settings which will inevitably have to be tightened. As a
result, I won't be able to respond to emails so quickly, and there's a
greater chance I won't see them at all.
2. Make it easy for them to use a form to contact you.
Well there's already a form on my contact page. People fill it in, and then
they click the "send" button. How can I make that any easier?
3. Do cautious spam filtering, implemented in an intelligent way.
At the moment I'm getting by with a couple of very simple rules. I'd
probably have to get more aggressive if my email address ended up in a lot
of spam lists, but that could cause problems with genuine emails (see 1)
There's plenty of "do nots", but there's one very important one:

1. Do not make it hard for people to contact you.

Spammers are people too.

I think you'll find that a lot of commercial websites don't contain any
email addresses at all. Now, have you thought about why that might be..?

Because they don't understand the medium. (How many businesses have
useless websites that tell you nothing about them or their products?) They
often have next to useless advertising in other mediums, too.

Because they don't implement good anti-spam techniques.

Because they reckon that they do most of their trade some other way (and of
course, because they've limited the ways that customers can contact them).

Don't you think it might be because:

(a) No spam filtering technique is infallible, either at blocking unwanted
email or accepting genuine email.

(b) Spam makes you less productive, and the best way to avoid it is to be
very careful with your email addresses.

--
phil [dot] ronan @ virgin [dot] net
http://vzone.virgin.net/phil.ronan/

Jul 24 '05 #52

Philip Ronan

"Dave Anderson" wrote:

Philip Ronan wrote:

You seem to have conveniently forgotten that the words "at" and "dot" crop
up quite frequently in the English language. OTOH, email addresses obey a
strict syntax that means they can be extracted very easily and very
reliably.
No, you're ignoring the fact that, except for obfuscated email
addresses, they don't occur all that often in the pattern <token>
<optional-open-bracket> "at" <matching-optional-close-bracket> <token>
(<optional-open-bracket> "dot" <matching-optional-close-bracket> <token>)+
[where '+' indicates "repeated one or more times"]. Harvesting only
cases which match that pattern is trivial, and I'd expect it to produce
a tolerable level of false positives (I'm not about to loose a spider on
the web to check).

Did you look here: <http://www.google.co.uk/search?q=%22at+*+dot%22>

Of the top 10 hits, I think only one is actually a reliable email address.
Even then the presence of the word "at" before it might a simple algorithm.

I should also mention that your algorithm wouldn't have any success
extracting my email address. I won't say *why* it would fail, but you need
to think a bit harder about what the "obvious alternatives" actually are.
If spammers start using tools which harvest entity-encoded addresses,
pages like those will adapt and start producing entity-encoded fake
addresses. Advantage nullified.
That's very true. but *until* this advantage is nullified, your email
address is vulnerable. And once it's on a spam list, there's no point asking
to have to removed. That's why I said you need to be at least 2 steps ahead.

I also think you're wring in asserting that spammers are happy with a 10%
hit rate....

"Happy" is probably an exaggeration, but it's clear (e.g., from the
dictionary attacks that they use) that many of them are willing to try
quite a few addresses in order to find a few working ones.

But surely you have to agree that their resources would be better spent
using email harvesting techniques that have a better hit rate.

Why don't you try writing some code to extract email addresses written using
'at/dot" instead of "@/." (and the "obvious" variants)? I think you'll find
this is not as easy as you say it is.

Regular-expression matching is extremely well understood, and there are
plenty of tools around which implement it. Transforming the pattern
above into a proper regular expression is simple. Where's the problem?

The problem is that RFC822 email addresses have a clearly defined format
that lends itself to pattern matching. As I mentioned above, a regular
expression based on the pattern you described above would fail on my
website.

Or could you at least look for an email harvesting application that does
this already? I don't think you'll have much luck.

Let's make sure I understand what you're saying. The fact that we don't
know of any harvesting application which extracts "... at ... dot ..."
obfuscated email addresses means that they are safe.

Not "safe", safer. I said they were "safer".
But entity-encoded
email addresses are unsafe even though we don't know of any harvesting
application which extracts them.
except this one: <http://www.mailutilities.com/aee/>
and possibly this one: <http://www.massreach.com/emh/>
and probably several others besides.
You do see the contradiction, I hope.

Nope.

--
phil [dot] ronan @ virgin [dot] net
http://vzone.virgin.net/phil.ronan/

Jul 24 '05 #53

Guy Macon

Tim wrote:

Muck us about, and we just go elsewhere. In the wrong circumstances, one
lost customer can be disastrous. They might be a big job, or refer
countless more to you, or worse, tell many how annoying it was having to
deal with you.

Humans and sofware both have an error rate when dealing with wanted
emails mixed with spam. If you get enough spam, you *will* treat
some wanted email as spam. One lost email can be disastrous. It
might be an offer of big job, or one that will refer countless more
to you, or worse, it might end up with someone telling many other
people how rude and annoying you are because you ignored their
email. Unless you can go through thousands of emails without ever
making a classification error, the question becomes on of trade-offs
between the disadvantages of the various choices. A choice with no
disadvantages does not exist.

--
Guy Macon
<http://www.guymacon.com/>
<a href="http://www.guymacon.com/">http://www.guymacon.com/</a>

Jul 24 '05 #54

Guy Macon

Philip Ronan wrote:

"Tim" wrote:
Muck us about, and we just go elsewhere.

Go right ahead. I'm trying to strike a balance between avoiding floods of
spam while providing clients with a useful means of contact that they can
use without having their email swallowed by an over-zealous spam blocker. I
won't repeat myself, but if you go back through this thread you'll see that
bare email addresses (with or without html entities and/or mailto: links)
are a liability. What I'm suggesting is the best alternative, under the
circumstances, IMHO.

If people *need* to contact me, then they *can*. Quite easily. Even if they
don't have Javascript enabled. Even if they have a phobia about using
feedback forms. Even if they have an obsessive need to retain a copy of
every message they ever send. If they're going to throw their hands up in
horror and go elsewhere, then frankly I'd see that as an added bonus. I deal
with enough pedantry in my job as it is.

In any case, I doubt that many obsessive HTML dogmatists are likely to have
either (a) any need for my services, or (b) any money to pay for them.

Depending on your business, it might be an advantage to keep some kinds
of users from contacting you. In my case, I was faced with far more
demand for my services than I could supply, so I rewrote my webpage in
such a way that it tends to turn away pointy-haired-bosses while being
attractive to scientists and engineers. I get fewer customers but the
ones I get are much higher quality.

Jul 24 '05 #55

Guy Macon

Dave Anderson wrote:

No, you're ignoring the fact that, except for obfuscated email
addresses, they don't occur all that often in the pattern <token>
<optional-open-bracket> "at" <matching-optional-close-bracket> <token>
(<optional-open-bracket> "dot" <matching-optional-close-bracket> <token>)+
[where '+' indicates "repeated one or more times"]. Harvesting only
cases which match that pattern is trivial, and I'd expect it to produce
a tolerable level of false positives (I'm not about to loose a spider on
the web to check).

That's a lot of effort to harvest a small number of addresses compared
to how many they could harvest if they simply scanned more webpages.

There have been many experiments done, and he answer always comes back
that the actual address harvesters that the actual spammers use don't
attempt to decode obfuscated addresses. It seems that the choke point
is sending spam, not harvesting email addresses, and that any spammer
can get more email addresses than he could possibly use just by looking
for unmodified mailto: links.

I have a theory that the spammers consider obfuscated email addresses
to be more likely to result in reports to RBLs such as spamcop.net.

--
Guy Macon <http://www.guymacon.com/>

Jul 24 '05 #56

Tim

Tim wrote:

1. Make it easy for customers to contact you with a real e-mail
address.
Philip Ronan wrote:

I'm doing that already.

... Oh, wait, you mean an *RFC822-formatted* email address? But if the
spambots get hold of it, I'll have a lot of spam to deal with. So I'll
have to spend more time weeding out the spam from my inbox, and genuine
correspondents are liable to have their emails bounced or swallowed by the
spam filter settings which will inevitably have to be tightened. As a
result, I won't be able to respond to emails so quickly, and there's a
greater chance I won't see them at all.
Actually, I'll go along with the notion that providing a form, and an
e-mail address that a human can copy or write down as being fairly "easy"
(so long as it doesn't involve a strange set of decoding instructions).
But so many allegedly professional sites have no real e-mail addresses
anywhere.

I really don't go along with silly scripting, or image file, tricks to
display an e-mail address but not to spammers. I've seen so many of those
that are just awful or simply don't work.

There's plenty of "do nots", but there's one very important one:

1. Do not make it hard for people to contact you. Spammers are people too.
No, I won't have that... Not at all. They're animals. They deserve
being chained to a chair with spikes on the seat, and made to answer help
line calls for Microsoft technical support.

I think you'll find that a lot of commercial websites don't contain
any email addresses at all. Now, have you thought about why that might
be..?

Because they don't understand the medium. (How many businesses have
useless websites that tell you nothing about them or their products?)
They often have next to useless advertising in other mediums, too.

Because they don't implement good anti-spam techniques.

Because they reckon that they do most of their trade some other way
(and of course, because they've limited the ways that customers can
contact them).

Don't you think it might be because:

(a) No spam filtering technique is infallible, either at blocking
unwanted email or accepting genuine email.
Thereby turning *their* problem into *your* problem. As a business I have
an awful lot of my time wasted by telephone canvassers, but I always
answer the phone. I know that any attempt to automatically screen them
loses potential customers. But it is far easier to screen e-mails,
unlike phone calls they don't require an instant response, and there are
various ways at very effectively filtering out spam.

Statistical analysis is quite effective (message content duplications,
message sources, alleged bounce error messages that don't correspond to
sent messages, etc.).

Content analysis is quite effective (destroy all mails with certain types
of files embedded - executables, batch scripts and so on).

Honeypots are extremely effective (if you get one spam to an unadvertised
honeybot e-mail address you know that every other mail on your system that
matches its contents is a spam - you can erase them all with impunity).

I just wish I could do honeypotting with my mail host. I have my own
domain name, and anytime I get a spam I get the same message sent to every
address at once. I'd be able to wipe them all out instantly, and with no
question of accidentally destroying a genuine e-mail sent to just one of
my addresses.
(b) Spam makes you less productive, and the best way to avoid it is to
be very careful with your email addresses.

Doesn't work. They get it once, they've got it.

--
If you insist on e-mailing me, use the reply-to address (it's real but
temporary). But please reply to the group, like you're supposed to.

This message was sent without a virus, please delete some files yourself.

Jul 24 '05 #57

D. Stussy

On Thu, 26 May 2005, Philip Ronan wrote:

"D. Stussy" wrote:
On Wed, 25 May 2005, Philip Ronan wrote:

Please explain why I'm pissing these people off.

I assume that he concluded that because these 5-10% of visitors, should they
come to your site, won't see your javascript encoded mailbox link.

No, they just see my email address.
Why not use a tool that ALL users will see - a server-side operation?

[I do understand that a few web-hosting operations may not allow those.
However, you didn't say that you had such a restriction....]

Or how about an email link AND a server-processed form?

Why both? The form by itself is sufficient - and they can get your mailbox
address from your reply. Exposing the link when [also] using the form is just
stupid: You're counting on a certain level of spammer stupidity which need not
continue into the future.

Jul 24 '05 #58

D. Stussy

On Fri, 27 May 2005, Tim wrote:

...
As do many, I dislike using forms. I don't get to keep a record of my
message, without playing cut and paste (i.e. I have a collection of files
outside of my mail client). Many forms are designed by complete morons,
expecting you to type a message in a 5 line by 20 character hole in the
page, or to fill in a plethora of details unrelated to my query, etc.

Is that a problem? If you don't know their e-mail address, it's probably
because this is going to be your first contact with the person. Are you
expecting to be able to submit a PhD dissertation via the form? With such a
form, what you're probably doing is telling the recipient BRIEFLY who you are
and why you want to reach them. When they reply, then you send your
"dissertation." [5x20 is pretty small.]

Jul 24 '05 #59

D. Stussy

On Fri, 27 May 2005, Philip Ronan wrote:

....
I also think you're wring in asserting that spammers are happy with a 10%
hit rate. If they're sending out 10 emails per second from a hijacked
server, of which only one is a valid address, then it would take about 2
weeks to send 1 million emails to real recipients. It's very likely that the
hijacked server would be detected and taken offline long before this happens

BS - although it's more likely that such a machine will get listed in an
anti-spam DNSBL (e.g. ordb.org). I've seen infected machines tried to hit and
infect mine for several months. 2 weeks is nothing.

Jul 24 '05 #60

Els

D. Stussy wrote:

On Thu, 26 May 2005, Philip Ronan wrote:
"D. Stussy" wrote:
On Wed, 25 May 2005, Philip Ronan wrote:

Please explain why I'm pissing these people off.

I assume that he concluded that because these 5-10% of visitors, should they
come to your site, won't see your javascript encoded mailbox link.

No, they just see my email address.
Why not use a tool that ALL users will see - a server-side operation?

[I do understand that a few web-hosting operations may not allow those.
However, you didn't say that you had such a restriction....]

Or how about an email link AND a server-processed form?

Why both? The form by itself is sufficient - and they can get your mailbox
address from your reply. Exposing the link when [also] using the form is just
stupid: You're counting on a certain level of spammer stupidity which need not
continue into the future.

I do have both a mailto link and a form on my site, and I don't feel
stupid.

So, why do I have a mailto link as well? Cause I think some visitors
may find it more convenient than a form. For one, they can use their
own mail program, thus will have a copy of whatever they wrote.

--
Els http://locusmeus.com/
Sonhos vem. Sonhos vão. O resto é imperfeito.
- Renato Russo -

Jul 24 '05 #61

D. Stussy

On Fri, 27 May 2005, Philip Ronan wrote:

No, what the Javascript is protecting is an email address formatted
according to RFC822. These can be picked up quite easily by spambots. Using
Javascript means the address can't be picked up so easily (or reliably). So
it's safer.

Until you have the javascript-smart spambot....

I've already seen that there are those which have no problem reading things
like ( or &040; or &64. There are even some that see " at " (after
translating non-alphanumerics to spaces) and assume "@". I haven't seen one do
the same for " dot " but that's the next step, along with stripping "nospam" or
"invalid" out.

If e-mail addresses dry up, spambot-writers will investigate why and if
necessary, create these things. I have yet to see an image file being
subjected to OCR software - but that's also not unreasonable to think that they
might in the future....

Jul 24 '05 #62

D. Stussy

On Sat, 28 May 2005, Tim wrote:

There's a rather large problem with that, as it's often implemented. It
allows spammers to use the form. There really isn't any good way to
determine whether someone's providing their own address, or is abusing the
form.

Sloppiness is the author's problem. One can check to see if the mailbox exists
- and with other data collectible (e.g. the client IP address from the
connection), one can check for reasonableness. Someone leaving an "@aol.com"
address coming in on an "aol.com" [dynamic] IP addressed connection is probably
leaving their REAL info. Anyone who exempts their forms' submissions from
their email spam filters is asking for it.

Jul 24 '05 #63

Philip Ronan

"Tim" wrote:

Philip Ronan wrote:
Spammers are people too.

No, I won't have that... Not at all. They're animals. They deserve
being chained to a chair with spikes on the seat, and made to answer help
line calls for Microsoft technical support.

I was using the word "people" in rather a broad sense there :-D

(a) No spam filtering technique is infallible, either at blocking
unwanted email or accepting genuine email.

[Statistical analysis, Content analysis, Honeypots...]

Yes, they're probably very useful techniques. But not infallible.

Doctors always say it's better to treat the cause of a complaint than to
treat the symptoms. Don't you think the same could be said in this
situation?

(b) Spam makes you less productive, and the best way to avoid it is to
be very careful with your email addresses.

Doesn't work. They get it once, they've got it.

So make sure they don't get it. :-p

--
phil [dot] ronan @ virgin [dot] net
http://vzone.virgin.net/phil.ronan/

Jul 24 '05 #64

Philip Ronan

"D. Stussy" wrote:

Or how about an email link AND a server-processed form?
Why both? The form by itself is sufficient - and they can get your mailbox
address from your reply.

In my case it's essential. Because of the sort of work I do, people need to
send me attachments from time to time. I also need to provide an email
address that agencies can add to their lists without too much difficulty.
Exposing the link when [also] using the form is just
stupid: You're counting on a certain level of spammer stupidity which need
not
continue into the future.

So far nobody in this thread has suggested an email harvesting technique
that would successfully pick out the email address from my contact page, so
I don't think I'm being *that* stupid.

--
phil [dot] ronan @ virgin [dot] net
http://vzone.virgin.net/phil.ronan/

Jul 24 '05 #65

Philip Ronan

"D. Stussy" wrote:

On Fri, 27 May 2005, Philip Ronan wrote:
No, what the Javascript is protecting is an email address formatted
according to RFC822. These can be picked up quite easily by spambots. Using
Javascript means the address can't be picked up so easily (or reliably). So
it's safer.
Until you have the javascript-smart spambot....

Those exist already. Some spambots are now built on an Internet Explorer
kernel, so whatever IE sees, the spambot sees. Simple Javascript tricks like
document.write('<A href="mai'+'lto:spam-me@'+'example.com">') are bound
to fail. I'm using something rather more sophisticated.
I've already seen that there are those which have no problem reading things
like ( or &040; or &64.
I think you'll find they're the same ones.
There are even some that see " at " (after
translating non-alphanumerics to spaces) and assume "@". I haven't seen one
do
the same for " dot " but that's the next step, along with stripping "nospam"
or
"invalid" out.
Really? Can you back that up? Fortunately they wouldn't be able to pick up
my email address, but I'd still be interested to hear a bit more about them.
If e-mail addresses dry up, spambot-writers will investigate why and if
necessary, create these things.
That's why you need to be at least 2 steps ahead.
I have yet to see an image file being
subjected to OCR software - but that's also not unreasonable to think that
they
might in the future....

That wouldn't bother me, People that use image files to display their email
address are already making a big mistake.

--
phil [dot] ronan @ virgin [dot] net
http://vzone.virgin.net/phil.ronan/

Jul 24 '05 #66

Tim

Tim wrote:

There's a rather large problem with that, as it's often implemented.
It allows spammers to use the form. There really isn't any good way to
determine whether someone's providing their own address, or is abusing
the form.

D. Stussy wrote:
Sloppiness is the author's problem. One can check to see if the mailbox
exists - and with other data collectible (e.g. the client IP address from
the connection), one can check for reasonableness. Someone leaving an
"@aol.com" address coming in on an "aol.com" [dynamic] IP addressed
connection is probably leaving their REAL info.
It really can't be done. For instance, if I fill in a form and it offers
to CC a reply to me, I input the address that it's going to use. I could
input anybody's address, and submit them to an unwanted message (or
several thousand messages). I could input my own e-mail address, but you
wouldn't be able to tell, my mail service provider and my ISP are
completely separate - there's nothing to tie them together (IP addresses,
hostnames, domains, nothing).

Now, some sane scripting could check for high speed abuse of a mail form,
or the overall content entered into it, but that's about all. Mail forms
that allow the inputing of e-mail addresses, like that, are a spammer's
dream.
Anyone who exempts their forms' submissions from their email spam
filters is asking for it.

It's a case of someone else's form being used to spam you.

--
If you insist on e-mailing me, use the reply-to address (it's real but
temporary). But please reply to the group, like you're supposed to.

This message was sent without a virus, please delete some files yourself.

Jul 24 '05 #67

Tim

Tim wrote:

As do many, I dislike using forms. I don't get to keep a record of my
message, without playing cut and paste (i.e. I have a collection of
files outside of my mail client). Many forms are designed by complete
morons, expecting you to type a message in a 5 line by 20 character hole
in the page, or to fill in a plethora of details unrelated to my query,

On Sun, 29 May 2005 07:48:19 +0000, D. Stussy wrote:
Is that a problem? If you don't know their e-mail address, it's probably
because this is going to be your first contact with the person. Are you
expecting to be able to submit a PhD dissertation via the form? With such
a form, what you're probably doing is telling the recipient BRIEFLY who
you are and why you want to reach them. When they reply, then you send
your "dissertation." [5x20 is pretty small.]

Yes.

Have you ever tried to file a report to a manufacturer about some hardware
that's failing, and you need to describe it for a warranty return? Or
getting you webhost to fix up the crappiness in their server, or anything
else needing more than a one sentence message?

That's hard to do in a tiny space, and they're the sort of thing that you
want to keep a record of, threaded along with *all* of the messages as you
chat back and forth.

There's plenty of other situations where forms are used in hideous
manners, and they're the only way that you can contact a company.

Oh, and I'm being generous with the 5 by 20 keyhole entry, by the way.
Some are worse than trying to use SMS displays.

--
If you insist on e-mailing me, use the reply-to address (it's real but
temporary). But please reply to the group, like you're supposed to.

This message was sent without a virus, please delete some files yourself.

Jul 24 '05 #68

Tim

Tim wrote:

1. Make it easy for customers to contact you with a real e-mail
address.
Philip Ronan wrote:

I'm doing that already.

... Oh, wait, you mean an *RFC822-formatted* email address? But if the
spambots get hold of it, I'll have a lot of spam to deal with. So I'll
have to spend more time weeding out the spam from my inbox, and genuine
correspondents are liable to have their emails bounced or swallowed by the
spam filter settings which will inevitably have to be tightened. As a
result, I won't be able to respond to emails so quickly, and there's a
greater chance I won't see them at all.
Actually, I'll go along with the notion that providing a form, and an
e-mail address that a human can copy or write down as being fairly "easy"
(so long as it doesn't involve a strange set of decoding instructions).
But so many allegedly professional sites have no real e-mail addresses
anywhere.

I really don't go along with silly scripting, or image file, tricks to
display an e-mail address but not to spammers. I've seen so many of those
that are just awful or simply don't work.

There's plenty of "do nots", but there's one very important one:

1. Do not make it hard for people to contact you. Spammers are people too.
No, I won't have that... Not at all. They're animals. They deserve
being chained to a chair with spikes on the seat, and made to answer help
line calls for Microsoft technical support.

I think you'll find that a lot of commercial websites don't contain
any email addresses at all. Now, have you thought about why that might
be..?

Because they don't understand the medium. (How many businesses have
useless websites that tell you nothing about them or their products?)
They often have next to useless advertising in other mediums, too.

Because they don't implement good anti-spam techniques.

Because they reckon that they do most of their trade some other way
(and of course, because they've limited the ways that customers can
contact them).

Don't you think it might be because:

(a) No spam filtering technique is infallible, either at blocking
unwanted email or accepting genuine email.
Thereby turning *their* problem into *your* problem. As a business I have
an awful lot of my time wasted by telephone canvassers, but I always
answer the phone. I know that any attempt to automatically screen them
loses potential customers. But it is far easier to screen e-mails,
unlike phone calls they don't require an instant response, and there are
various ways at very effectively filtering out spam.

Statistical analysis is quite effective (message content duplications,
message sources, alleged bounce error messages that don't correspond to
sent messages, etc.).

Content analysis is quite effective (destroy all mails with certain types
of files embedded - executables, batch scripts and so on).

Honeypots are extremely effective (if you get one spam to an unadvertised
honeybot e-mail address you know that every other mail on your system that
matches its contents is a spam - you can erase them all with impunity).

I just wish I could do honeypotting with my mail host. I have my own
domain name, and anytime I get a spam I get the same message sent to every
address at once. I'd be able to wipe them all out instantly, and with no
question of accidentally destroying a genuine e-mail sent to just one of
my addresses.
(b) Spam makes you less productive, and the best way to avoid it is to
be very careful with your email addresses.

Jul 24 '05 #69

Stan Brown

On Sun, 29 May 2005 09:53:58 +0200, Els <el*********@tiscali.nl>
wrote:

I do have both a mailto link and a form on my site, and I don't feel
stupid.

So, why do I have a mailto link as well? Cause I think some visitors
may find it more convenient than a form. For one, they can use their
own mail program, thus will have a copy of whatever they wrote.

Thank you! I wish more Web authors gave a little thought to the
convenience of legitimate visitors. Spam is an annoyance, but
exaggerated measures to avoid it end up hurting legitimate
correspondence far more than they hurt the spammers.

--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://OakRoadSystems.com/
"I feel a wave of morning sickness coming on, and I want to
be standing on your mother's grave when it hits."

Jul 24 '05 #70

D. Stussy

On Thu, 26 May 2005, Philip Ronan wrote:

"D. Stussy" wrote:
On Wed, 25 May 2005, Philip Ronan wrote:

Please explain why I'm pissing these people off.

I assume that he concluded that because these 5-10% of visitors, should they
come to your site, won't see your javascript encoded mailbox link.

No, they just see my email address.
Why not use a tool that ALL users will see - a server-side operation?

[I do understand that a few web-hosting operations may not allow those.
However, you didn't say that you had such a restriction....]

Or how about an email link AND a server-processed form?

Jul 24 '05 #71

D. Stussy

On Fri, 27 May 2005, Tim wrote:

...
As do many, I dislike using forms. I don't get to keep a record of my
message, without playing cut and paste (i.e. I have a collection of files
outside of my mail client). Many forms are designed by complete morons,
expecting you to type a message in a 5 line by 20 character hole in the
page, or to fill in a plethora of details unrelated to my query, etc.

Jul 24 '05 #72

D. Stussy

On Fri, 27 May 2005, Philip Ronan wrote:

....
I also think you're wring in asserting that spammers are happy with a 10%
hit rate. If they're sending out 10 emails per second from a hijacked
server, of which only one is a valid address, then it would take about 2
weeks to send 1 million emails to real recipients. It's very likely that the
hijacked server would be detected and taken offline long before this happens

Jul 24 '05 #73

Els

D. Stussy wrote:

On Thu, 26 May 2005, Philip Ronan wrote:
"D. Stussy" wrote:
On Wed, 25 May 2005, Philip Ronan wrote:

Please explain why I'm pissing these people off.

I assume that he concluded that because these 5-10% of visitors, should they
come to your site, won't see your javascript encoded mailbox link.

No, they just see my email address.
Why not use a tool that ALL users will see - a server-side operation?

[I do understand that a few web-hosting operations may not allow those.
However, you didn't say that you had such a restriction....]

Or how about an email link AND a server-processed form?

Why both? The form by itself is sufficient - and they can get your mailbox
address from your reply. Exposing the link when [also] using the form is just
stupid: You're counting on a certain level of spammer stupidity which need not
continue into the future.

Jul 24 '05 #74

clusters78

Wow I take a peek at this group for the first time in years, and this
is what comes to mind.

I remember I used to browse this group back in 97/98, when animated
gifs was considered cutting edge technology and Netscape 2.0 was the
latest and greatest thing.

Back then, c.i.w.a.h had a formula:

* someone posts a presentation question, usually not w3c complaint or
even (god forgid!) about javascript
* ab*****@fnx.com would post an extremely rude, insulting,
self-important one line answer aimed not to help but to insult the OP
from a high horse
* Alan Flavell or the Arnoud "Galactus" Englefried guy then gives a
more civilized spiel about what a bad idea that is, at the same time
still totally ignoring the original question
* Then either the Jukka guy, Stan Brown, Arjun Ray (not quite as
insolent as abigail but nearly as self-important), or one of the other
dogmatists I no longer remember would post, chastising the original
poster for not being considerate about people who are blind, can only
surf the web with one hand, still use 14.4 modem because they live in
Mozambique, etc. etc. And how not minding that extreme minority would
cause business to be lost, earth to crack open,Tim Berners Lee to
personally come to your house and whip you, etc.
* Origional poster becomes extremely (and understandbly) offended
* Flame war ensues
* other sheeps that like to echo the resident dogmatists chime in
* lurker like me read the group, rotfloao, go back to writing our PERL
script, Macromedia Director (now Flash) etc. (For some time I actually
sided with the dogmatists, because clearly they spent a lot of time to
be proficient at what they do and I do respect the diligence and
varying degrees of talent. However, after some time it became clear:
the web was passing them by (exact time of passing probably 2000-01 or
so), not to mention the self-important preachiness of some of the
regulars was an extreme turn-off.

Now in 2005, with DSL even considered "slow" in the web-savvy nations,
and web developers making mega bucks for writing non w3c compliant code
and using Flash and DHTML and the latest and greatest
whatchamawebgadget. I do think the "real world" has basically
proclaimed a loud *cough* you to the ciwah dogmas.

Mean while c.i.w.a.h has stayed in a time capsule, evidently.

The more things change, the more they stay the same.

Jul 24 '05 #75

D. Stussy

On Fri, 27 May 2005, Philip Ronan wrote:

No, what the Javascript is protecting is an email address formatted
according to RFC822. These can be picked up quite easily by spambots. Using
Javascript means the address can't be picked up so easily (or reliably). So
it's safer.

Jul 24 '05 #76

D. Stussy

On Sat, 28 May 2005, Tim wrote:

There's a rather large problem with that, as it's often implemented. It
allows spammers to use the form. There really isn't any good way to
determine whether someone's providing their own address, or is abusing the
form.

Jul 24 '05 #77

Philip Ronan

"Tim" wrote:

Philip Ronan wrote:
Spammers are people too.

No, I won't have that... Not at all. They're animals. They deserve
being chained to a chair with spikes on the seat, and made to answer help
line calls for Microsoft technical support.

I was using the word "people" in rather a broad sense there :-D

(a) No spam filtering technique is infallible, either at blocking
unwanted email or accepting genuine email.

[Statistical analysis, Content analysis, Honeypots...]

(b) Spam makes you less productive, and the best way to avoid it is to
be very careful with your email addresses.

Doesn't work. They get it once, they've got it.

So make sure they don't get it. :-p

--
phil [dot] ronan @ virgin [dot] net
http://vzone.virgin.net/phil.ronan/

Jul 24 '05 #78

Guy Macon

Philip Ronan wrote:

Those exist already. Some spambots are now built on an Internet Explorer
kernel, so whatever IE sees, the spambot sees.

You have evidence of this?

Programs that harvest as you web surf don't count.

Programs that look at a bunch of URLs or filenames you provide
don't count.

Only actual spambots - robots that spider the web looking for
email addresses - count as being spambots.

Jul 24 '05 #79

Philip Ronan

"D. Stussy" wrote:

Or how about an email link AND a server-processed form?
Why both? The form by itself is sufficient - and they can get your mailbox
address from your reply.

In my case it's essential. Because of the sort of work I do, people need to
send me attachments from time to time. I also need to provide an email
address that agencies can add to their lists without too much difficulty.
Exposing the link when [also] using the form is just
stupid: You're counting on a certain level of spammer stupidity which need
not
continue into the future.

Jul 24 '05 #80

Guy Macon

clusters78 wrote:

The more things change, the more they stay the same.

Including, it seems, trolls who pop into a perfectly fine discussion
about cloaking email addresses and go off on a wild tangent.

Why didn't you start a new thread?

Jul 24 '05 #81

Philip Ronan

"D. Stussy" wrote:

On Fri, 27 May 2005, Philip Ronan wrote:
No, what the Javascript is protecting is an email address formatted
according to RFC822. These can be picked up quite easily by spambots. Using
Javascript means the address can't be picked up so easily (or reliably). So
it's safer.
Until you have the javascript-smart spambot....

Those exist already. Some spambots are now built on an Internet Explorer
kernel, so whatever IE sees, the spambot sees. Simple Javascript tricks like
document.write('<A href="mai'+'lto:spam-me@'+'example.com">') are bound
to fail. I'm using something rather more sophisticated.
I've already seen that there are those which have no problem reading things
like ( or &040; or &64.
I think you'll find they're the same ones.
There are even some that see " at " (after
translating non-alphanumerics to spaces) and assume "@". I haven't seen one
do
the same for " dot " but that's the next step, along with stripping "nospam"
or
"invalid" out.
Really? Can you back that up? Fortunately they wouldn't be able to pick up
my email address, but I'd still be interested to hear a bit more about them.
If e-mail addresses dry up, spambot-writers will investigate why and if
necessary, create these things.
That's why you need to be at least 2 steps ahead.
I have yet to see an image file being
subjected to OCR software - but that's also not unreasonable to think that
they
might in the future....

That wouldn't bother me, People that use image files to display their email
address are already making a big mistake.

--
phil [dot] ronan @ virgin [dot] net
http://vzone.virgin.net/phil.ronan/

Jul 24 '05 #82

Guy Macon

Tim wrote:

As a business I have an awful lot of my time wasted by telephone
canvassers, but I always answer the phone. I know that any
attempt to automatically screen them loses potential customers.

Start getting a hundred per hour and your opinion about aswering
your phone is likely to change.

Jul 24 '05 #83

Guy Macon

Spamming is a classic case of bottlenecking. Nobody puts more effort
into cutting down trees if the lumbermill can't handle the ones they
are already cutting down. Nobody increases lumbermill capacity if
the roads/rails can't transport the current output. Spamming is the
same; you harvest email addresses then spam them. Look at the evidence
available from examining the spammers at work; it's obvious that they
are not making even slight efforts to harvest more email addresses,
but they are making a huge effort to get past spam filters. It's pretty
clear that the delivery is the bottleneck.

--
Guy Macon <http://www.guymacon.com/>

Jul 24 '05 #84

Philip Ronan

"Guy Macon" wrote:

Philip Ronan wrote:
Those exist already. Some spambots are now built on an Internet Explorer
kernel, so whatever IE sees, the spambot sees.

You have evidence of this?

Programs that harvest as you web surf don't count.

Programs that look at a bunch of URLs or filenames you provide
don't count.

Only actual spambots - robots that spider the web looking for
email addresses - count as being spambots.

OK, here are a few for you to try out:

<http://www.soft32.com/download_12423.html>
"A simple yet invaluable tool that can automatically dig out all of the
email addresses from web pages, text files, and even scripts!" -- sounds
really friendly, doesn't it?

<http://www.mailutilities.com/aee/>
I'm not so sure about this one, but it's based on an IE kernel so it should
have no problems with html entities and possibly Javascript links too.

<http://www.email-marketing-easy.com/desc/indexN3516.html>
By the sound of things, this one can break just about anything that uses
document.write() to generate a clickable mailto link. There's more info at
<http://1ahosting.com/scammers.html> -- scroll down until you see the bit
about a program called "EFGrabber".

I don't run Windows, so I can't test any of these personally.

--
phil [dot] ronan @ virgin [dot] net
http://vzone.virgin.net/phil.ronan/

Jul 24 '05 #85

Tim

Tim wrote:

There's a rather large problem with that, as it's often implemented.
It allows spammers to use the form. There really isn't any good way to
determine whether someone's providing their own address, or is abusing
the form.

D. Stussy wrote:
Sloppiness is the author's problem. One can check to see if the mailbox
exists - and with other data collectible (e.g. the client IP address from
the connection), one can check for reasonableness. Someone leaving an
"@aol.com" address coming in on an "aol.com" [dynamic] IP addressed
connection is probably leaving their REAL info.
It really can't be done. For instance, if I fill in a form and it offers
to CC a reply to me, I input the address that it's going to use. I could
input anybody's address, and submit them to an unwanted message (or
several thousand messages). I could input my own e-mail address, but you
wouldn't be able to tell, my mail service provider and my ISP are
completely separate - there's nothing to tie them together (IP addresses,
hostnames, domains, nothing).

Now, some sane scripting could check for high speed abuse of a mail form,
or the overall content entered into it, but that's about all. Mail forms
that allow the inputing of e-mail addresses, like that, are a spammer's
dream.
Anyone who exempts their forms' submissions from their email spam
filters is asking for it.

Jul 24 '05 #86

Tim

Tim wrote:

As do many, I dislike using forms. I don't get to keep a record of my
message, without playing cut and paste (i.e. I have a collection of
files outside of my mail client). Many forms are designed by complete
morons, expecting you to type a message in a 5 line by 20 character hole
in the page, or to fill in a plethora of details unrelated to my query,

On Sun, 29 May 2005 07:48:19 +0000, D. Stussy wrote:
Is that a problem? If you don't know their e-mail address, it's probably
because this is going to be your first contact with the person. Are you
expecting to be able to submit a PhD dissertation via the form? With such
a form, what you're probably doing is telling the recipient BRIEFLY who
you are and why you want to reach them. When they reply, then you send
your "dissertation." [5x20 is pretty small.]

Jul 24 '05 #87

Stan Brown

On Sun, 29 May 2005 09:53:58 +0200, Els <el*********@tiscali.nl>
wrote:

I do have both a mailto link and a form on my site, and I don't feel
stupid.

So, why do I have a mailto link as well? Cause I think some visitors
may find it more convenient than a form. For one, they can use their
own mail program, thus will have a copy of whatever they wrote.

Jul 24 '05 #88

clusters78

Jul 24 '05 #89

Dave Anderson

Philip Ronan wrote:

"Dave Anderson" wrote:
Philip Ronan wrote:
You seem to have conveniently forgotten that the words "at" and "dot" crop
up quite frequently in the English language. OTOH, email addresses obey a
strict syntax that means they can be extracted very easily and very
reliably.
No, you're ignoring the fact that, except for obfuscated email
addresses, they don't occur all that often in the pattern <token>
<optional-open-bracket> "at" <matching-optional-close-bracket> <token>
(<optional-open-bracket> "dot" <matching-optional-close-bracket> <token>)+
[where '+' indicates "repeated one or more times"]. Harvesting only
cases which match that pattern is trivial, and I'd expect it to produce
a tolerable level of false positives (I'm not about to loose a spider on
the web to check).

Did you look here: <http://www.google.co.uk/search?q=%22at+*+dot%22>

Of the top 10 hits, I think only one is actually a reliable email

address.

Yes, I looked there. Three don't match the pattern I gave, and one more
doesn't match if the test is extended slightly to check for a valid
top-level domain (this could easily be extended further to use DNS to
check whether the full domain name exists and has an A or MX record --
which would probably eliminate the vast majority of false matches). Two
match real email addresses, and one more matches a possibly-real address
that only appears in the TITLE element. One matches something that
isn't an email address, and two more match something that isn't an email
address which appears only in the TITLE element. I don't know whether
current address-harvesting tools check the TITLE or just the body. So,
worst case, 2 real addresses and 3 false addresses. Given some of the
antics I've seen spammers go through, I expect that many of them would
find this ratio to be acceptable.
Even then the presence of the word "at" before it might a simple algorithm.

This doesn't make any sense -- what did you mean to say?
I should also mention that your algorithm wouldn't have any success
extracting my email address. I won't say *why* it would fail, but you need
to think a bit harder about what the "obvious alternatives" actually are.
The only URL I've found you giving in this thread is
<http://vzone.virgin.net/phil.ronan/>, and I don't see any trace of an
email address there (obfuscated or not). If you want to challenge me,
kindly provide a URL. I've been arguing based on the example you gave a
while back

<SCRIPT type="text/javascript">

generateEmailLink();

</SCRIPT><NOSCRIPT>mail [at] example [dot] com<NOSCRIPT>

Based on the

phil [dot] ronan @ virgin [dot] net

in your signature and other considerations, I do have to slightly extend
the pattern I gave earlier -- by allowing the username to be <token>
(<bracket> dot <bracket> <token>)+ rather than just <token>, and by
allowing literal "@" and "." as well as "at" and "dot", but this doesn't
make any substantial change to my argument. I should also note that,
especially since you're assuming that they'll be harvesting
entity-encoded email addresses, I'd expect them to start by translating
all entities to the appropriate characters before doing anything else --
so trying to confuse the match by including some entified characters
will have no effect.

If spammers start using tools which harvest entity-encoded addresses,
pages like those will adapt and start producing entity-encoded fake
addresses. Advantage nullified.

That's very true. but *until* this advantage is nullified, your email
address is vulnerable. And once it's on a spam list, there's no point asking
to have to removed. That's why I said you need to be at least 2 steps ahead.

This applies equally well to "... at ... dot ..." obfuscated addresses
-- both forms are about equally safe (or unsafe), which is why using
JavaScript encoding to avoid the one while still using the other doesn't
make any sense.

I also think you're wring in asserting that spammers are happy with a 10%
hit rate....

"Happy" is probably an exaggeration, but it's clear (e.g., from the
dictionary attacks that they use) that many of them are willing to try
quite a few addresses in order to find a few working ones.

But surely you have to agree that their resources would be better spent
using email harvesting techniques that have a better hit rate.

What matters is what they actually do, not what we think it would be
sensible for them to do. Given their actual use of things like
dictionary attacks, assuming that they'll only use techniques which
produce a very high percentage of real addresses is foolish.

Why don't you try writing some code to extract email addresses written using
'at/dot" instead of "@/." (and the "obvious" variants)? I think you'll find
this is not as easy as you say it is.

Regular-expression matching is extremely well understood, and there are
plenty of tools around which implement it. Transforming the pattern
above into a proper regular expression is simple. Where's the problem?

The problem is that RFC822 email addresses have a clearly defined format
that lends itself to pattern matching. As I mentioned above, a regular
expression based on the pattern you described above would fail on my
website.

You've said that a couple of times, but I haven't found any place where
you provide a URL addressing a page which includes an address obfuscated
in whatever way you use. Until you provide such a URL, there's no way I
can know whether or not it's easily recognizable.

Or could you at least look for an email harvesting application that does
this already? I don't think you'll have much luck.

Let's make sure I understand what you're saying. The fact that we don't
know of any harvesting application which extracts "... at ... dot ..."
obfuscated email addresses means that they are safe.

Not "safe", safer. I said they were "safer".

Not much. As you have said elsewhere in this thread, if you put
something on the web (or in a newsgroup) it's there forever -- and,
given that it *is* technically feasible, they'll probably start
harvesting them some day (if they haven't already).

But entity-encoded
email addresses are unsafe even though we don't know of any harvesting
application which extracts them.

except this one: <http://www.mailutilities.com/aee/>
and possibly this one: <http://www.massreach.com/emh/>
and probably several others besides.

In an earlier message in this thread, someone posted

Practical experiment says that this technique, even just using @
does avoid them being blagged by spammers.

Clearly the spammers could bypass it, but they aren't bothering too.

and you replied
Maybe not yet. But as soon as they do (and I'm sure they will), you're
basically sunk.

which said to me that you didn't know of any tools harvesting
entity-encoded email addresses. I've been arguing on that basis, since
I don't claim to keep track of the capabilities of current spammer tools.

A quick look at those websites provided no information one way or the
other about whether they harvest entity-encoded email addresses. I'm
curious: how do you know that they do this (and how do you know that no
other tool harvests "... at ... dot ..." obfuscated email addresses)?

Dave

Jul 24 '05 #90

Guy Macon

Philip Ronan wrote:

Those exist already. Some spambots are now built on an Internet Explorer
kernel, so whatever IE sees, the spambot sees.

Jul 24 '05 #91

Guy Macon

clusters78 wrote:

The more things change, the more they stay the same.

Including, it seems, trolls who pop into a perfectly fine discussion
about cloaking email addresses and go off on a wild tangent.

Why didn't you start a new thread?

Jul 24 '05 #92

Guy Macon

Tim wrote:

As a business I have an awful lot of my time wasted by telephone
canvassers, but I always answer the phone. I know that any
attempt to automatically screen them loses potential customers.

Start getting a hundred per hour and your opinion about aswering
your phone is likely to change.

Jul 24 '05 #93

Guy Macon

Jul 24 '05 #94

Philip Ronan

"Guy Macon" wrote:

Philip Ronan wrote:
Those exist already. Some spambots are now built on an Internet Explorer
kernel, so whatever IE sees, the spambot sees.

You have evidence of this?

Programs that harvest as you web surf don't count.

Programs that look at a bunch of URLs or filenames you provide
don't count.

Only actual spambots - robots that spider the web looking for
email addresses - count as being spambots.

Jul 24 '05 #95

Dave Anderson

Philip Ronan wrote:

"Dave Anderson" wrote:
Philip Ronan wrote:
You seem to have conveniently forgotten that the words "at" and "dot" crop
up quite frequently in the English language. OTOH, email addresses obey a
strict syntax that means they can be extracted very easily and very
reliably.
No, you're ignoring the fact that, except for obfuscated email
addresses, they don't occur all that often in the pattern <token>
<optional-open-bracket> "at" <matching-optional-close-bracket> <token>
(<optional-open-bracket> "dot" <matching-optional-close-bracket> <token>)+
[where '+' indicates "repeated one or more times"]. Harvesting only
cases which match that pattern is trivial, and I'd expect it to produce
a tolerable level of false positives (I'm not about to loose a spider on
the web to check).

Did you look here: <http://www.google.co.uk/search?q=%22at+*+dot%22>

Of the top 10 hits, I think only one is actually a reliable email

address.

Yes, I looked there. Three don't match the pattern I gave, and one more
doesn't match if the test is extended slightly to check for a valid
top-level domain (this could easily be extended further to use DNS to
check whether the full domain name exists and has an A or MX record --
which would probably eliminate the vast majority of false matches). Two
match real email addresses, and one more matches a possibly-real address
that only appears in the TITLE element. One matches something that
isn't an email address, and two more match something that isn't an email
address which appears only in the TITLE element. I don't know whether
current address-harvesting tools check the TITLE or just the body. So,
worst case, 2 real addresses and 3 false addresses. Given some of the
antics I've seen spammers go through, I expect that many of them would
find this ratio to be acceptable.
Even then the presence of the word "at" before it might a simple algorithm.

This doesn't make any sense -- what did you mean to say?
I should also mention that your algorithm wouldn't have any success
extracting my email address. I won't say *why* it would fail, but you need
to think a bit harder about what the "obvious alternatives" actually are.
The only URL I've found you giving in this thread is
<http://vzone.virgin.net/phil.ronan/>, and I don't see any trace of an
email address there (obfuscated or not). If you want to challenge me,
kindly provide a URL. I've been arguing based on the example you gave a
while back

<SCRIPT type="text/javascript">

generateEmailLink();

</SCRIPT><NOSCRIPT>mail [at] example [dot] com<NOSCRIPT>

Based on the

phil [dot] ronan @ virgin [dot] net

in your signature and other considerations, I do have to slightly extend
the pattern I gave earlier -- by allowing the username to be <token>
(<bracket> dot <bracket> <token>)+ rather than just <token>, and by
allowing literal "@" and "." as well as "at" and "dot", but this doesn't
make any substantial change to my argument. I should also note that,
especially since you're assuming that they'll be harvesting
entity-encoded email addresses, I'd expect them to start by translating
all entities to the appropriate characters before doing anything else --
so trying to confuse the match by including some entified characters
will have no effect.

If spammers start using tools which harvest entity-encoded addresses,
pages like those will adapt and start producing entity-encoded fake
addresses. Advantage nullified.

That's very true. but *until* this advantage is nullified, your email
address is vulnerable. And once it's on a spam list, there's no point asking
to have to removed. That's why I said you need to be at least 2 steps ahead.

This applies equally well to "... at ... dot ..." obfuscated addresses
-- both forms are about equally safe (or unsafe), which is why using
JavaScript encoding to avoid the one while still using the other doesn't
make any sense.

I also think you're wring in asserting that spammers are happy with a 10%
hit rate....

"Happy" is probably an exaggeration, but it's clear (e.g., from the
dictionary attacks that they use) that many of them are willing to try
quite a few addresses in order to find a few working ones.

But surely you have to agree that their resources would be better spent
using email harvesting techniques that have a better hit rate.

What matters is what they actually do, not what we think it would be
sensible for them to do. Given their actual use of things like
dictionary attacks, assuming that they'll only use techniques which
produce a very high percentage of real addresses is foolish.

Why don't you try writing some code to extract email addresses written using
'at/dot" instead of "@/." (and the "obvious" variants)? I think you'll find
this is not as easy as you say it is.

Regular-expression matching is extremely well understood, and there are
plenty of tools around which implement it. Transforming the pattern
above into a proper regular expression is simple. Where's the problem?

The problem is that RFC822 email addresses have a clearly defined format
that lends itself to pattern matching. As I mentioned above, a regular
expression based on the pattern you described above would fail on my
website.

You've said that a couple of times, but I haven't found any place where
you provide a URL addressing a page which includes an address obfuscated
in whatever way you use. Until you provide such a URL, there's no way I
can know whether or not it's easily recognizable.

Or could you at least look for an email harvesting application that does
this already? I don't think you'll have much luck.

Let's make sure I understand what you're saying. The fact that we don't
know of any harvesting application which extracts "... at ... dot ..."
obfuscated email addresses means that they are safe.

Not "safe", safer. I said they were "safer".

Not much. As you have said elsewhere in this thread, if you put
something on the web (or in a newsgroup) it's there forever -- and,
given that it *is* technically feasible, they'll probably start
harvesting them some day (if they haven't already).

But entity-encoded
email addresses are unsafe even though we don't know of any harvesting
application which extracts them.

except this one: <http://www.mailutilities.com/aee/>
and possibly this one: <http://www.massreach.com/emh/>
and probably several others besides.

In an earlier message in this thread, someone posted

Practical experiment says that this technique, even just using @
does avoid them being blagged by spammers.

Clearly the spammers could bypass it, but they aren't bothering too.

and you replied
Maybe not yet. But as soon as they do (and I'm sure they will), you're
basically sunk.

Jul 24 '05 #96

Dave Anderson

Guy Macon wrote:

Dave Anderson wrote:
No, you're ignoring the fact that, except for obfuscated email
addresses, they don't occur all that often in the pattern <token>
<optional-open-bracket> "at" <matching-optional-close-bracket> <token>
(<optional-open-bracket> "dot" <matching-optional-close-bracket> <token>)+
[where '+' indicates "repeated one or more times"]. Harvesting only
cases which match that pattern is trivial, and I'd expect it to produce
a tolerable level of false positives (I'm not about to loose a spider on
the web to check).

That's a lot of effort to harvest a small number of addresses compared
to how many they could harvest if they simply scanned more webpages.

There have been many experiments done, and he answer always comes back
that the actual address harvesters that the actual spammers use don't
attempt to decode obfuscated addresses. It seems that the choke point
is sending spam, not harvesting email addresses, and that any spammer
can get more email addresses than he could possibly use just by looking
for unmodified mailto: links.

I have a theory that the spammers consider obfuscated email addresses
to be more likely to result in reports to RBLs such as spamcop.net.

You may be right, but that would apply equally to entity-encoded email
addresses -- and the issue at hand was the *difference* in safety
between entity-encoded addresses and "... at ... dot ..." obfuscated ones.

Dave

Jul 24 '05 #97

Dave Anderson

Philip Ronan wrote:

"Dave Anderson" wrote:

The real issue is whether JavaScript-encoding the mailto link provides
any real security for the email address relative to more
generally-accessible schemes such as entity-encoding the link; in the
presence of your "... [at] ... [dot] ..." alternative, it doesn't --
since that alternative exposes what the JavaScript-encoding is intended
to protect.

No, what the Javascript is protecting is an email address formatted
according to RFC822. These can be picked up quite easily by spambots. Using
Javascript means the address can't be picked up so easily (or reliably). So
it's safer.

That's true in a very literal sense, but it's of no value if the
underlying email address can be derived from other information on the page.

Dave

Jul 24 '05 #98

Bart Lateur

Philip Ronan wrote:

So far nobody in this thread has suggested an email harvesting technique
that would successfully pick out the email address from my contact page, so
I don't think I'm being *that* stupid.

--
phil [dot] ronan @ virgin [dot] net
http://vzone.virgin.net/phil.ronan/

#!/usr/local/bin/perl -w
$_ = '</SCRIPT>username [at] domain [dot] com<SCRIPT
type="text/javascript">';
if(my($address) =
/(\w+\s*((?:\[dot\]|\.)\s*\w+)*\s*(?:\@|\[at\])\s*\w+\s*((?:\[dot\]|\.)\s*\w+)*)/i)
{
for($address) {
s/\[dot\]/./ig;
s/\[at\]/\@/ig;
s/\s+//g;
}
print "Found mailto address: <$address>\n";
}
__END__
Result:
Found mailto address: <us******@domain.com>

--
Bart.

Jul 24 '05 #99

Philip Ronan

"Dave Anderson" wrote:

Philip Ronan wrote:

No, what the Javascript is protecting is an email address formatted
according to RFC822. These can be picked up quite easily by spambots. Using
Javascript means the address can't be picked up so easily (or reliably). So
it's safer.

That's true in a very literal sense, but it's of no value if the
underlying email address can be derived from other information on the page.

It *is* valuable if the underlying email address is difficult to extract
*automatically*
--
phil [dot] ronan @ virgin [dot] net
http://vzone.virgin.net/phil.ronan/

Jul 24 '05 #100

Similar topics