By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
437,738 Members | 1,114 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 437,738 IT Pros & Developers. It's quick & easy.

Munging e-mail addresses

P: n/a
I have e-mail addresses in HTML source files both as commented
information information and also as part of an e-mail link. Naturally,
I'd like to hide them from harvesters to discourage more spam.

The standard approach simply makes the address human readable, but not
machine readable. For example, my address would be:

brownh @hartford-hwp.com

Which I assume is really no more secure than

brownh at hartford-hwp dot com

because when harvesting software begins to cope with these tricks, it
will have no more problem with the second than the first. Because
that's only a matter of a short time before harvesters are not so
easily fooled, I'm looking for a better method.

One technique is to us a javascript to encrypt the address. I've tried
Hivewire Enkoder and it works fine, except that a significant
percentage of browsers have javascript disabled. So this option is
out.

An alternative is to make the address a graphical file. I suppose I
could define the dimensions of this graphics in terms of em, so that
when a user changes his font size, it will change as well. However,
the addresses of concern to me are only in the webpage source, and so
a graphical substitute is no help.

Is there any method (I'm running debian) that will not depend on
javascript and yet is likely to block a harvester for quite some time
to come? It would be nice to have it controlled by a style sheet, so
that the thousands of instances do not have to be updated one by one
as need arises.

--
Haines Brown

Jul 20 '05 #1
Share this Question
Share on Google+
11 Replies


P: n/a
>>>>> "Haines" == Haines Brown <br****@teufel.hartford-hwp.com> writes:

Haines> One technique is to us a javascript to encrypt the address. I've tried
Haines> Hivewire Enkoder and it works fine, except that a significant
Haines> percentage of browsers have javascript disabled. So this option is
Haines> out.

Yes, don't do that.

Haines> An alternative is to make the address a graphical file. I suppose I
Haines> could define the dimensions of this graphics in terms of em, so that
Haines> when a user changes his font size, it will change as well. However,
Haines> the addresses of concern to me are only in the webpage source, and so
Haines> a graphical substitute is no help.

No, do that either. Blind users will sue your ass.

So far, I've not seen any harvesters even bother with
HTML-de-entitizing, since there are so many "low hanging fruits" that
don't require much CPU processing. If someone could tell me if
they've been harvested as:

Send mail to me at
<a href="mailto:merlyn@stonehenge.com>merlyn@stonehen ge.com</a>!

then I'll stop recommending that. But seriously, this is enough to
thwart everyone out there so far. To be doubly safe, encode some of
"mailto:" as well.

When there are 10 million fewer addresses that *aren't* written
that way, I'll change my recommendation. :)

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<me****@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
Jul 20 '05 #2

P: n/a
"Haines Brown" <br****@teufel.hartford-hwp.com> wrote in message
news:87************@teufel.hartford-hwp.com...
I have e-mail addresses in HTML source files both as commented
information information and also as part of an e-mail link. Naturally,
I'd like to hide them from harvesters to discourage more spam.


I have a simple and effective solution. Use an email form, with some
standard formmail procedure; in the field where you would normally specify
the recipient's email address, put in a code; then modify the formmail
procedure to map the code into your desired email address.

This completely removes the email address from your HTML, making it
impossible for harvesters to find it. This also prevents your formmail
procedure from being hijacked by spammers.

The downside is that some hosts don't let you customize their formmail
procedure. (This means YOU, Tina, and it's the only thing which is
preventing me from switching my clients to you! Sigh.)

Jul 20 '05 #3

P: n/a
me****@stonehenge.com (Randal L. Schwartz) writes:
So far, I've not seen any harvesters even bother with
HTML-de-entitizing, since there are so many "low hanging fruits"
that don't require much CPU processing. If someone could tell me if
they've been harvested as:

Send mail to me at <a
href="mailto:merlyn@stonehenge.com>merlyn@stonehen ge.com</a>!

then I'll stop recommending that. But seriously, this is enough to
thwart everyone out there so far. To be doubly safe, encode some of
"mailto:" as well.


Elegant solution! Probably the reason it didn't occur to me ;-) I'm
only waiting for a lurker to jump in and explain why your assumptions
are wrong, but so far, so good (I haven't tried it yet under IE).

While I can't use this for the e-mail address that is placed in a
comment in the page source, I figure anyone who is savy enough to
snoop there is sophisticated enough to understand any tricks I might
play with the address.

--
Haines Brown

Jul 20 '05 #4

P: n/a
On Mon, 21 Jun 2004 17:38:52 GMT, Haines Brown
<br****@teufel.hartford-hwp.com> wrote:
I have e-mail addresses in HTML source files both as commented
information information and also as part of an e-mail link. Naturally,
I'd like to hide them from harvesters to discourage more spam.


At a friend's suggestion well over a year ago I started using "Email
Address Encoder" found at: http://www.wbwip.com/wbw/emailencoder.html

It's quick and easy and since implementing it on only one email
address that's posted on one web page I haven't gotten any unwanted
garbage. Maybe I've just been lucky, but it's worked for me. You can
see it in use at the URL below.

Leslie
Leslie's Audio Trivia
http://www.BessieBee.com/Trivia/

"I refuse to have a battle of wits with an unarmed person."
Jul 20 '05 #5

P: n/a
"C A Upsdell" <cupsdell0311XXX@-> wrote in
comp.infosystems.www.authoring.html:
I have a simple and effective solution. Use an email form, with some
standard formmail procedure; in the field where you would normally specify
the recipient's email address, put in a code; then modify the formmail
procedure to map the code into your desired email address.

This completely removes the email address from your HTML, making it
impossible for harvesters to find it. This also prevents your formmail
procedure from being hijacked by spammers.


And it also irritates the heck out of people like me who would
rather use our regular mailer because we like its editor, its spell
checker, and its ability to retain a copy of what we sent.

--
Stan Brown, Oak Road Systems, Cortland County, New York, USA
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2 spec: http://www.w3.org/TR/REC-CSS2/
2.1 changes: http://www.w3.org/TR/CSS21/changes.html
validator: http://jigsaw.w3.org/css-validator/
Jul 20 '05 #6

P: n/a
Leslie <bo****@rocketmail.com> writes:
At a friend's suggestion well over a year ago I started using "Email
Address Encoder" found at: http://www.wbwip.com/wbw/emailencoder.html

It's quick and easy and since implementing it on only one email
address that's posted on one web page I haven't gotten any unwanted
garbage. Maybe I've just been lucky, but it's worked for me. You can
see it in use at the URL below.


I might add that this url points to a utility that simply converts all
the characters in an address to character entities. Leslie's mailto,
which uses it, works just fine (my galeon browser brings up my
emacs/rmail reader).

However, this page to which he refers also has a link to a page
( http://www.neokraft.net/sottises/mailencoder/ ) that offers a
service to convert an address into hexadecimal notation. I tried it on
my galeon browser, and it works.

I don't see in principle any advantage of hexidecimal over character
entities, or representing an entire address as character entities
rather than just doing a character or two in "mailto" and your
address, but suspect that if harvesters eventually go to the trouble
to skip empty spaces, convert the word "dot" and "@", and even convert
any character entities to plain ASCII, there is less likelihood they
will also think to convert from hex.

Let me go back to one of my original questions: is it possible to
represent an address in a source page, such as placing in its head:

<!-- contact Haines Brown: br***@hartford-hwp.com -->

in a way that is transparent to a human reader, but opaque to a
harvester?

--
Haines Brown

Jul 20 '05 #7

P: n/a
Haines Brown wrote:
<snip>
I don't see in principle any advantage of hexidecimal over character
entities, or representing an entire address as character entities
rather than just doing a character or two in "mailto" and your
address, but suspect that if harvesters eventually go to the trouble
to skip empty spaces, convert the word "dot" and "@", and even convert
any character entities to plain ASCII, there is less likelihood they
will also think to convert from hex.


Eventually someone will write an e-mail address harvester that uses IE
embedded as a component (it is not that difficult). When they do it will
be able to read IE's interpretation of link URLs straight from the DOM
and character and hex encoding, or javascript tricks, will not help
conceal anything form it.

The observation that currently there are plenty of e-mail addresses
available for little or no effort explains why such a system has not
been created (combined with technical ignorance on the part of people
who are interested in harvesting e-mail addresses). But in the end it
might just be the case that if you put an e-mail address in a public
place you should expect to get spammed.

Richard.
Jul 20 '05 #8

P: n/a
On Mon, 21 Jun 2004, Stan Brown wrote:
"C A Upsdell" <cupsdell0311XXX@-> wrote in
comp.infosystems.www.authoring.html:
I have a simple and effective solution. Use an email form, with some
standard formmail procedure; in the field where you would normally specify
the recipient's email address, put in a code; then modify the formmail
procedure to map the code into your desired email address.

This completely removes the email address from your HTML, making it
impossible for harvesters to find it. This also prevents your formmail
procedure from being hijacked by spammers.


And it also irritates the heck out of people like me who would
rather use our regular mailer because we like its editor, its spell
checker, and its ability to retain a copy of what we sent.


Perhaps so, but you get to use your regular mail program when the person you
are making first contact with via their form REPLIES and thus gives you a
(hopefully valid) return mailbox.

Personally, I use a PHP script to generate and process the form that mails me.

No matter how much someone "munges" an address, that will NEVER stop spam that
finds mailboxes by using a random word attack (including dictionary attacks; a
subset). Smarter spambots will eventually read things like &#40 and such, and
therefore, those constructs only delay the inevitable harvesting.

Using images to display the mailbox, besides being obnoxious to the blind,
won't work where someone is using a non-graphical browser, or has "load images"
turned off.
Jul 20 '05 #9

P: n/a
On Tue, 22 Jun 2004, Haines Brown wrote:
Let me go back to one of my original questions: is it possible to
represent an address in a source page, such as placing in its head:

<!-- contact Haines Brown: br***@hartford-hwp.com -->

in a way that is transparent to a human reader, but opaque to a
harvester?


That IS transparent to the human reader, viewing the page's source
notwithstanding, and probably opaque to a spambot. The fact that it's in a
comment construct doesn't matter; they'll take it.

What is of concern is that you think that the REAL mailbox should be there, not
that of a spamtrap or honeypot address....
Jul 20 '05 #10

P: n/a
"D. Stussy" <kd****@bde-arc.ampr.org> writes:
On Tue, 22 Jun 2004, Haines Brown wrote:
Let me go back to one of my original questions: is it possible to
represent an address in a source page, such as placing in its head:

<!-- contact Haines Brown: br***@hartford-hwp.com -->

in a way that is transparent to a human reader, but opaque to a
harvester?


That IS transparent to the human reader, viewing the page's source
notwithstanding, and probably opaque to a spambot. The fact that it's in a
comment construct doesn't matter; they'll take it.

What is of concern is that you think that the REAL mailbox should be there, not
that of a spamtrap or honeypot address....


I don't want to drag this out, but I'm not following your point.

The example is of the following line placed in a web page header:

<!-- contact Haines Brown: br***@hartford-hwp.com -->

You seem to agree with my presumption that this is no problem for a
human (viewing the source page) to read. However, I don't understand
why you suggest that a spambot harvester would probably not be able to
read it. Why not?

Are you saying that I am wrong to want my real address to appear in
the header? Why? I started to include it years ago when people would
actually study each others pages, but maintain it today just because I
feel I should admit responsibility for my work in the source, much as
I would in programing source code.

Haines Brown
Jul 20 '05 #11

P: n/a
On Tue, 22 Jun 2004 12:11:02 +0100, "Richard Cornford"
<Ri*****@litotes.demon.co.uk> wrote:

Eventually someone will write an e-mail address harvester that uses IE
embedded as a component (it is not that difficult).


Indeed, it was around 2002 when I did mine...

Jim.
--
comp.lang.javascript FAQ - http://jibbering.com/faq/

Jul 20 '05 #12

This discussion thread is closed

Replies have been disabled for this discussion.