472,103 Members | 1,032 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,103 software developers and data experts.

Cloaking Email Address

Any suggestions as to the best programs for cloaking email addresses?
Many thanks

--
Steevo


Jul 24 '05
117 11435
"Bart Lateur" wrote:
#!/usr/local/bin/perl -w
$_ = '</SCRIPT>username [at] domain [dot] com<SCRIPT
type="text/javascript">';
if(my($address) =
/(\w+\s*((?:\[dot\]|\.)\s*\w+)*\s*(?:\@|\[at\])\s*\w+\s*((?:\[dot\]|\.)\s*\w+)
*)/i)
{
for($address) {
s/\[dot\]/./ig;
s/\[at\]/\@/ig;
s/\s+//g;
}
print "Found mailto address: <$address>\n";
}
__END__
Result:
Found mailto address: <us******@domain.com>


OK, that's a start I suppose.

Now extend this regular expression so it can also detect these email
addressees:

username_at_domain.com
username [at] another domain [dot] com
user-at-domain-dot-com

.... plus all the other variations on this theme. And don't forget you'll
have to strip out HTML tags first.

Not quite as easy as you think, is it?

--
phil [dot] ronan @ virgin [dot] net
http://vzone.virgin.net/phil.ronan/
Jul 24 '05 #101
"Dave Anderson" wrote:
Philip Ronan wrote:
Even then the presence of the word "at" before it might a simple
algorithm.


This doesn't make any sense -- what did you mean to say?


.... might *confuse* a simple algorithm. Sorry about that!
I should also mention that your algorithm wouldn't have any success
extracting my email address. I won't say *why* it would fail, but you need
to think a bit harder about what the "obvious alternatives" actually are.


The only URL I've found you giving in this thread is
<http://vzone.virgin.net/phil.ronan/>, and I don't see any trace of an
email address there (obfuscated or not). If you want to challenge me,
kindly provide a URL. I've been arguing based on the example you gave a
while back


OK, I was actually referring to my business site which is here:
<http://www.japanesetranslator.co.uk/>

You could have found this quite easily yourself, but I didn't want to give
it to you straight away just to emphasize the point that your "obvious
alternatives" aren't as obvious as you might think.

I just visited another site this morning where the author included a mailto
link with "@" replaced by "_at_". You wouldn't have found that one either.
There are probably thousands of other ways of concealing email addresses in
a similar manner. If you write an algorithm that extracts all of them, then
I expect your false hit rate it going to be extremely high.
If spammers start using tools which harvest entity-encoded addresses,
pages like those will adapt and start producing entity-encoded fake
addresses. Advantage nullified.


That's very true. but *until* this advantage is nullified, your email
address is vulnerable. And once it's on a spam list, there's no point asking
to have to removed. That's why I said you need to be at least 2 steps ahead.


This applies equally well to "... at ... dot ..." obfuscated addresses
-- both forms are about equally safe (or unsafe), which is why using
JavaScript encoding to avoid the one while still using the other doesn't
make any sense.


Well what I'm saying is that "at/dot" and its variants are much safer
because they are a lot harder to detect reliably.
But surely you have to agree that [spammers'] resources would be better
spent using email harvesting techniques that have a better hit rate.


What matters is what they actually do, not what we think it would be
sensible for them to do. Given their actual use of things like
dictionary attacks, assuming that they'll only use techniques which
produce a very high percentage of real addresses is foolish.


OK, so what *are* they actually doing? I've already pointed out [1] software
that (as far as I can tell) extracts Javascript mailto links and decodes
HTML entities. So we can assume the spammers are doing that already.

[1] <http://groups.google.co.uk/group/com...uthoring.html/
msg/e4ddd8c603db86ed?hl=en>

--
phil [dot] ronan @ virgin [dot] net
http://vzone.virgin.net/phil.ronan/
Jul 24 '05 #102
Philip Ronan wrote:
"Dave Anderson" wrote:
Philip Ronan wrote:
I should also mention that your algorithm wouldn't have any success
extracting my email address. I won't say *why* it would fail, but you need
to think a bit harder about what the "obvious alternatives" actually are.


The only URL I've found you giving in this thread is
<http://vzone.virgin.net/phil.ronan/>, and I don't see any trace of an
email address there (obfuscated or not). If you want to challenge me,
kindly provide a URL. I've been arguing based on the example you gave a
while back


OK, I was actually referring to my business site which is here:
<http://www.japanesetranslator.co.uk/>

You could have found this quite easily yourself, but I didn't want to give
it to you straight away just to emphasize the point that your "obvious
alternatives" aren't as obvious as you might think.


I probably could have, but I've got better things to do than spend time
searching the net for a site where you couldn't be bothered to provide a
URL. Given that whether or not *I* find a site is totally irrelevant to
whether or not an address-harvesting tool will find it, your coyness
doesn't accomplish what you seem to think it would.

Now that I've seen the page, I know that what you've done is corrupt the
address in a way which you assume a person will figure out how to
reverse (in this case, by inserting whitespace where it's not allowed).
I just visited another site this morning where the author included a mailto
link with "@" replaced by "_at_". You wouldn't have found that one either.
There are probably thousands of other ways of concealing email addresses in
a similar manner. If you write an algorithm that extracts all of them, then
I expect your false hit rate it going to be extremely high.


I'm sure it's what you expect, but it's probably not what would happen.
[Well, extracting *all* of them is unlikely -- but extracting
essentially all of the commonly-used schemes ought to be practical.]

I don't usually pay much attention to how people obfuscate addresses,
but this thread has started me thinking about it. If I wanted to write
an address extractor, I'd start off by doing some research (including
loosing a spider to bring back a whole bunch of stuff for analysis).
Based only on what I know now, what I'd do is:

1) read each page and convert each entity to its proper character.
2) remove all element start and end tags, saving any attribute-value
strings within them for processing later in the same way as the page
content; for elements whose content is handled out-of-line (e.g., TITLE,
SCRIPT), also save the element's content for processing later.
3) add a single space at the start and end of the text, and collapse
each sequence of whitespace characters into a single space.
4) for each occurance of '@', 'at' (upper or lower case, or mixed case,
or using non-ascii Unicode characters which look like 'a' or 't'), reset
the list of valid domains then search backward for a username candidate
and forward for a hostdomain candidate; skip this occurance unless
plausible candidates for both are found.
a) in this section, ignore any space character immediately adjacent to
the subject text. first, remove any matching brackets surrounding the
subject text: {} / [] / <> / () and additional non-ascii Unicode
characters. Also remove any punctuation character which occurs on both
sides of the subject text. Repeat until all such pairs have been
removed, collapsing any sequences of multiple spaces to single spaces.
Initially set a flag to 'true' if the subject text is a punctuation
character, false otherwise; set the flag to true whenever a pair of
punctuation characters is removed.
b) If the flag is false and the character immediately adjacent on either
side to the subject text is not a space, skip this occurance.
c) scan forward for the first occurance of '.', 'dot' (upper or lower
case, or mixed case, or using non-ascii Unicode characters which look
like 'd', 'o', or 't'); if none is found (or if the number of characters
scanned over which are allowed in domain name segments exceeds 63), skip
to scanning for the next 'at'. Process the subject text as in 4(a). If
the flag is false and the character immediately adjacent on either side
to the subject text is not a space, skip to scanning for the next 'at'.
d) Remove all characters not allowed in domain name segments from the
text scanned over in 4(c) and save the result plus '.' as the candidate
hostdomain.
e) repeat 4(c) until it tries to skip to scanning for the next 'at'; for
each successful case, remove all characters not allowed in domain name
segments from the text scanned over in 4(c) and append the result plus
'.' to the candidate hostdomain.
f) scan forward until as many characters allowed in domain name segments
have been scanned as the length of the longest top-level domain.
Whenever at least one such character has been scanned and the next
character is a punctuation character, query DNS for an MX record for the
domain name consisting of the candidate hostdomain plus the scanned
characters with all those not allowed in domain name segments removed.
If the DNS query returns one or more MX records, save that domain name
as a valid domain.
g) if there is at least one entry in the valid domain list, scan
backward from the first non-space character before the 'at' for the
first punctuation character which is not allowed in a username; if at
least one character was scanned over and none of the characters scanned
over were ones not allowed in a username, prefix the characters scanned
over plus '@' to each item in the valid domain list and save it as a
harvested email address.

That should be fairly efficient, find just about all email addresses
obfuscated using the general "... at ... dot ..." pattern (including
unobfuscated ones), and still have a fairly high percentage of real
addresses. [The algorithm for scanning usernames can probably be
improved; I've run out of steam for now.]
If spammers start using tools which harvest entity-encoded addresses,
pages like those will adapt and start producing entity-encoded fake
addresses. Advantage nullified.

That's very true. but *until* this advantage is nullified, your email
address is vulnerable. And once it's on a spam list, there's no point asking
to have to removed. That's why I said you need to be at least 2 steps ahead.


This applies equally well to "... at ... dot ..." obfuscated addresses
-- both forms are about equally safe (or unsafe), which is why using
JavaScript encoding to avoid the one while still using the other doesn't
make any sense.


Well what I'm saying is that "at/dot" and its variants are much safer
because they are a lot harder to detect reliably.


Harder, yes. Impractical, no. See above.
But surely you have to agree that [spammers'] resources would be better
spent using email harvesting techniques that have a better hit rate.


What matters is what they actually do, not what we think it would be
sensible for them to do. Given their actual use of things like
dictionary attacks, assuming that they'll only use techniques which
produce a very high percentage of real addresses is foolish.


OK, so what *are* they actually doing? I've already pointed out [1] software
that (as far as I can tell) extracts Javascript mailto links and decodes
HTML entities. So we can assume the spammers are doing that already.

[1] <http://groups.google.co.uk/group/com...uthoring.html/
msg/e4ddd8c603db86ed?hl=en>


Since the underlying question is whether spammers are willing to use
harvesting techniques which produce lots of false positives, this is not
relevant. We've *seen* spammers using dictionary attacks (which
necessarily involve a very high fraction of false addresses), so we
*know* that at least some spammers are willing to use such techniques.

Dave

Jul 24 '05 #103
Philip Ronan wrote:
"Dave Anderson" wrote:
Philip Ronan wrote:
No, what the Javascript is protecting is an email address formatted
according to RFC822. These can be picked up quite easily by spambots. Using
Javascript means the address can't be picked up so easily (or reliably). So
it's safer.


That's true in a very literal sense, but it's of no value if the
underlying email address can be derived from other information on the page.


It *is* valuable if the underlying email address is difficult to extract
*automatically*


Which it isn't.

Dave

Jul 24 '05 #104
Philip Ronan wrote:
OK, that's a start I suppose.

Now extend this regular expression so it can also detect these email
addressees:

username_at_domain.com
username [at] another domain [dot] com
user-at-domain-dot-com
The algorithm I posted a few minutes ago (before seeing this posting)
handles all of those (and a lot more).
... plus all the other variations on this theme. And don't forget you'll
have to strip out HTML tags first.

Not quite as easy as you think, is it?


Wrong.

Dave

Jul 24 '05 #105
Philip Ronan wrote:
"Tim" wrote:
Philip Ronan wrote:

(b) Spam makes you less productive, and the best way to avoid it is to
be very careful with your email addresses.


Doesn't work. They get it once, they've got it.


So make sure they don't get it. :-p


Which is not possible. For example:

1) dictionary-style attacks

2) someone else posting your address in a harvestable form

3) posting your address in what you think is a non-harvestable form
which someone later starts harvesting

You can certainly make it less likely that they'll manage to harvest
your email address, but it's not entirely under your control. And when
they do get it, the only defense (short of spamming becoming an
actively-enforced capital offense worldwide) is good blocking.

Dave

Jul 24 '05 #106


Dave Anderson wrote:
username_at_domain.com
username [at] another domain [dot] com
user-at-domain-dot-com


The algorithm I posted a few minutes ago (before seeing this posting)
handles all of those (and a lot more).


How about this one that I used in 2003?

guymacon+" http://www.guymacon.com/ "03@spamcop.net

Perfectly legal according to RFC-2822, and I got no reports of
anyone not being able to email me, but I used it in newsgroup
posts and on web pages for all of 2003, and not a single spammer
harvested my address.
Jul 24 '05 #107
"Dave Anderson" wrote:
[Re: email addresses of the form "user at another example dot com"]

I've got better things to do than spend time
searching the net for a site where you couldn't be bothered to provide a
URL.
You don't get it do you? According to you, a "trivial" algorithm should have
been able to extract this email address automatically. Instead, I have to
point you right at it before you can do anything about it.
Now that I've seen the page, I know that what you've done is corrupt the
address in a way which you assume a person will figure out how to
reverse (in this case, by inserting whitespace where it's not allowed).
Corrupted? Not allowed?? It wouldn't be legal in an RFC822 email address,
but that's not what we're discussing here. Anyone with a brain can figure
out what to do with this email address.
Based only on what I know now, what I'd do is:
[SNIP: 3.3 KB of half-baked pseudo code]
Let me just remind you what you said last week:
Do you *really* believe that it's any harder to detect and process this
(and its obvious variants) than it is to process an entity-encoded email
address?


Earlier on I showed you FIVE LINES of PHP that can extract an RFC822
formatted email address (with or without HTML entity encoding) from any web
page. Now to be fair, I should admit that an additional 3 lines are needed
to decode numerical entities, and the regular expression could probably be
improved.

But how many lines of code do you think your algorithm would run to? How
long would it take you to write and debug all of this? And what sort of hit
rate do you think you would achieve? Do you still honestly believe this is a
"trivial" matter?

Your algorithm would still fail at other "obvious alternatives" like
"mailATexampleDOTcom", or "m a i l @ e x a m p l e . c o m". How would you
work these into your algorithm? Do you honestly believe spammers are going
to tackle these addresses before they start decoding html entities? And
please bear in mind what I said about being 2 steps ahead. If the spammers
move on to entities, I'll move on to something else like Turing numbers.
Since the underlying question is whether spammers are willing to use
harvesting techniques which produce lots of false positives, this is not
relevant. We've *seen* spammers using dictionary attacks (which
necessarily involve a very high fraction of false addresses), so we
*know* that at least some spammers are willing to use such techniques.


My Oxford dictionary has 192000 words in it. Are you suggesting a spammer is
going to send me 192000 emails in the hope that just one will get through?
These attacks might produce results at domains like hotmail.com, but not on
my website. Any email to a non-existent mailbox goes straight into a black
hole.

I think it's about time you admitted defeat :-)

--
phil [dot] ronan @ virgin [dot] net
http://vzone.virgin.net/phil.ronan/
Jul 24 '05 #108
On Mon, 30 May 2005 23:52:56 +0000, Guy Macon
<_see.web.page_@_www.guymacon.com_> wrote:
How about this one that I used in 2003?

guymacon+" http://www.guymacon.com/ "03@spamcop.net

Perfectly legal according to RFC-2822, and I got no reports of
anyone not being able to email me, but I used it in newsgroup
posts and on web pages for all of 2003, and not a single spammer
harvested my address.


What made you stop using it, then?

--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://OakRoadSystems.com/
"I feel a wave of morning sickness coming on, and I want to
be standing on your mother's grave when it hits."
Jul 24 '05 #109

Stan Brown wrote:

Guy Macon <http://www.guymacon.com/> wrote:
How about this one that I used in 2003?

guymacon+" http://www.guymacon.com/ "03@spamcop.net

Perfectly legal according to RFC-2822, and I got no reports of
anyone not being able to email me, but I used it in newsgroup
posts and on web pages for all of 2003, and not a single spammer
harvested my address.


What made you stop using it, then?


I am running a series of long-term experiments using different
techniques. The disadvanges of the above email address is that
some humans falsely conclude that it isn't valid and thus don't
try to email me, and that some web forms falsely conclude that
it isn't valid and won't let me enter it.


Jul 24 '05 #110
Guy Macon wrote:
Dave Anderson wrote:
username_at_domain.com
username [at] another domain [dot] com
user-at-domain-dot-com


The algorithm I posted a few minutes ago (before seeing this posting)
handles all of those (and a lot more).


How about this one that I used in 2003?

guymacon+" http://www.guymacon.com/ "03@spamcop.net

Perfectly legal according to RFC-2822, and I got no reports of
anyone not being able to email me, but I used it in newsgroup
posts and on web pages for all of 2003, and not a single spammer
harvested my address.


I didn't think of that one, so it wouldn't be handled. OTOH, I did note
that some research needed to be done and that the username handling
could be improved.

Dave

Jul 24 '05 #111
Guy Macon wrote:
Philip Ronan wrote:
Those exist already. Some spambots are now built on an Internet Explorer
kernel, so whatever IE sees, the spambot sees.


You have evidence of this?


I have, at least of the core technology:

<http://search.cpan.org/~abeltje/Win32-IE-Mechanize-0.008/>
This indeed does run Javascript before you get the page back, so
anything written using document.write() is in the result.

--
Bart.
Jul 24 '05 #112
[Apologies for the delayed response; I've been a bit busy.]

Philip Ronan wrote:
"Dave Anderson" wrote:
[Re: email addresses of the form "user at another example dot com"]

I've got better things to do than spend time
searching the net for a site where you couldn't be bothered to provide a
URL.
You don't get it do you? According to you, a "trivial" algorithm should have
been able to extract this email address automatically. Instead, I have to
point you right at it before you can do anything about it.


If you can't tell the difference between the tasks of finding something
within a particular page and of finding a page which contains some
particular bit of information, you've got a serious problem with
rational thought.
Now that I've seen the page, I know that what you've done is corrupt the
address in a way which you assume a person will figure out how to
reverse (in this case, by inserting whitespace where it's not allowed).


Corrupted? Not allowed?? It wouldn't be legal in an RFC822 email address,
but that's not what we're discussing here. Anyone with a brain can figure
out what to do with this email address.


Play word games all you want, but don't think it isn't obvious that
you're trying to obscure the real issues.
Based only on what I know now, what I'd do is:

[SNIP: 3.3 KB of half-baked pseudo code]


That "half-baked" code clearly demonstrated the practicality of doing of
what you claimed couldn't be done.
Let me just remind you what you said last week:
Do you *really* believe that it's any harder to detect and process this
(and its obvious variants) than it is to process an entity-encoded email
address?

I'll admit that I should have said "significantly" rather than "any",
but that doesn't make any great difference to the real issue -- which is
whether such processing is practical.
Earlier on I showed you FIVE LINES of PHP that can extract an RFC822
formatted email address (with or without HTML entity encoding) from any web
page. Now to be fair, I should admit that an additional 3 lines are needed
to decode numerical entities, and the regular expression could probably be
improved.

But how many lines of code do you think your algorithm would run to? How
long would it take you to write and debug all of this? And what sort of hit
rate do you think you would achieve? Do you still honestly believe this is a
"trivial" matter?
Whoopti-doo! Processing one form takes less code than processing the
other. Since I've demonstrated that neither takes an *impractical*
amount of code, this difference has negligible bearing on which
processing spammer tools will implement.
Your algorithm would still fail at other "obvious alternatives" like
"mailATexampleDOTcom", or "m a i l @ e x a m p l e . c o m". How would you
work these into your algorithm? Do you honestly believe spammers are going
to tackle these addresses before they start decoding html entities? And
please bear in mind what I said about being 2 steps ahead. If the spammers
move on to entities, I'll move on to something else like Turing numbers.
Given that your old encodings are captured in various archives, you
can't "move on" (unless you're willing to abandon all of your old
addresses).
Since the underlying question is whether spammers are willing to use
harvesting techniques which produce lots of false positives, this is not
relevant. We've *seen* spammers using dictionary attacks (which
necessarily involve a very high fraction of false addresses), so we
*know* that at least some spammers are willing to use such techniques.


My Oxford dictionary has 192000 words in it. Are you suggesting a spammer is
going to send me 192000 emails in the hope that just one will get through?
These attacks might produce results at domains like hotmail.com, but not on
my website. Any email to a non-existent mailbox goes straight into a black
hole.


I detect a true master of the non sequitur.
I think it's about time you admitted defeat :-)


Should I admit that I'll never change your mind? Sure -- you've made it
abundantly clear that you're unwilling to listen to anything that
contradicts your ideas. I'm done with this thread.

Should I admit that I'm wrong? Certainly not. You haven't produced any
successful counter-arguments to any of my major points.

Dave

Jul 24 '05 #113

Here is (one) right way to handle email addresses.

Get an email account with spamcop.net [ http://spamcop.net ].
This alone scares off many email spammers; they often purge
all spamcop addresses in order to stay of the Spamcop blocklist
a bit longer.

"Cloak" your email address like this:

When dealing with Spishak corp, give your email address as
no************@spamcop.net.

When putting your email address on your webpage, use something
like no***********@spamcop.net.

When emailing Bill Gates, use no**********@spamcop.net

As soon as you start getting a few spams at any of these addresses,
report the spams to Spamcop, which will filter out all email from
that source, not only for you but for all users of the spamcop
blocklist. If they hit someone else who reports to spamcop first,
you will never see that first one.

If you start getting a lot of spam to an address, block it and
select a new one. What I do is to start using the new address,
then a month later I whitelist anyone who has ever emailed me at
the old address and block the rest.

(Spamcop has the feature of being able to retrieve email from
multiple accounts with other ISPs, filtering them, and forwarding
them to any email address you choose, so you can even keep all
your old accounts alive)

No obfuscating, munging, or tricks with character entities or
JavaScript needed.

-------------------------------

About "Plussed" email addresses:

RFC 2822 (which replaces section 6 of RFC 822) says that "+" is legal
when used on the left side of the "@" character in email addresses.
See sections 3.4.1 and 3.2.5 at http://www.ietf.org/rfc/rfc2822.txt or
http://www.faqs.org/rfcs/rfc2822.html for details.

Newer versions of Sendmail accept such "plussed" email addresses,
discarding everything from the "+" to just before the "@". This
can help you to track who sells your email address and in spam
filtering. Many ISPs allow you to have a plussed email address;
try sending one to your present email address and see. Virtually
all ISPs allow you to send plussed email addresses.

Jul 24 '05 #114
On Mon, 30 May 2005 23:52:56 +0000,
Guy Macon<_see.web.page_@_www.guymacon.com_> wrote:

guymacon+" http://www.guymacon.com/ "03@spamcop.net

Perfectly legal according to RFC-2822, and I got no reports of
anyone not being able to email me, but I used it in newsgroup
posts and on web pages for all of 2003, and not a single spammer
harvested my address.


According to RFC-2822, the "local-part" is either a
"dot-atom", "quoted-string" or an "obs-local-part".
What you have before the "@" is neither of those.

Am I missing something?
--n
Jul 24 '05 #115

Content-Transfer-Encoding: 8Bit

Nisse Engström wrote:


On Mon, 30 May 2005 23:52:56 +0000,
Guy Macon<_see.web.page_@_www.guymacon.com_> wrote:

guymacon+" http://www.guymacon.com/ "03@spamcop.net

Perfectly legal according to RFC-2822, and I got no reports of
anyone not being able to email me, but I used it in newsgroup
posts and on web pages for all of 2003, and not a single spammer
harvested my address.


According to RFC-2822, the "local-part" is either a
"dot-atom", "quoted-string" or an "obs-local-part".
What you have before the "@" is neither of those.

Am I missing something?


Nope. I made an error. I cut and pasted the above from my
"legal under RFC 822" example in my text file. For RFC-2822
I would have had to put quotes around the entire local part
or used string+string@... as my example. Sorry about that.

I just put a warning in all caps into the text file I cut
and pasted that from so that I won't make that mistake again.
I apologize for the error.

Jul 24 '05 #116
On Sun, 12 Jun 2005 13:13:40 +0000,
Guy Macon<_see.web.page_@_www.guymacon.com_> wrote:

Nisse Engström wrote:

On Mon, 30 May 2005 23:52:56 +0000,
Guy Macon<_see.web.page_@_www.guymacon.com_> wrote:

guymacon+" http://www.guymacon.com/ "03@spamcop.net


According to RFC-2822, the "local-part" is either a
"dot-atom", "quoted-string" or an "obs-local-part".
What you have before the "@" is neither of those.

Am I missing something?


Nope. I made an error. I cut and pasted the above from my
"legal under RFC 822" example in my text file. For RFC-2822
I would have had to put quotes around the entire local part
or used string+string@... as my example. Sorry about that.


[Sorry about the late reply (again), but I didn't look
into this until I read RFC 822 for some other reason.]

I still don't see how that address is valid.

The RFC 822 rules are:

addr-spec = local-part "@" domain
local-part = word *("." word)
word = atom / quoted-string
atom = 1*<any CHAR except specials, SPACE and CTLs>
quoted-string = <"> *(qtext/quoted-pair) <">
1. There is only one <word> because there is no "." to the
left of the "@".
2. The <word> is not an <atom>, because <atom> does not
contain <SPACE>.
3. The <word> is not a <quoted-string>, because it does
not begin and end with double quotes.
The following would be valid RFC 822 addresses (I think):

"guymacon+\" http://www.guymacon.com/ \"03"@spamcop.net
guymacon." http://www.guymacon.com/ ".03@spamcop.net
--n
Jul 24 '05 #117

Content-Transfer-Encoding: 8Bit
Nisse Engström wrote:

Guy Macon<_see.web.page_@_www.guymacon.com_> wrote:
guymacon+" http://www.guymacon.com/ "03@spamcop.net
[is] legal under RFC 822.


I still don't see how that address is valid.

The RFC 822 rules are:

addr-spec = local-part "@" domain
local-part = word *("." word)
word = atom / quoted-string
atom = 1*<any CHAR except specials, SPACE and CTLs>
quoted-string = <"> *(qtext/quoted-pair) <">

1. There is only one <word> because there is no "." to the
left of the "@".
2. The <word> is not an <atom>, because <atom> does not
contain <SPACE>.
3. The <word> is not a <quoted-string>, because it does
not begin and end with double quotes.
The following would be valid RFC 822 addresses (I think):

"guymacon+\" http://www.guymacon.com/ \"03"@spamcop.net
guymacon." http://www.guymacon.com/ ".03@spamcop.net


Russ Allbery made this comment a while back:

| macon+."http://www.guymacon.com/ "@example.com is legal under RFC 822, but
| not under RFC 2822. Under RFC 2822, you have to do something like
| "macon+http://www.guymacon.com/ "@example.com (in other words, quoting the
| whole string).
|
| Quoted LHS parts in e-mail addresses have an iffy reputation with MUAs; a
| lot of software authors don't bother getting this right.
|
| Russ Allbery (rr*@stanford.edu) <http://www.eyrie.org/~eagle/>

....and I just rechecked the RFCs. It looks like I was in error
*again*! It's not legal without the "." :(

Note to self: next time, smoke crack *after* posting to Usenet...


Jul 24 '05 #118

This discussion thread is closed

Replies have been disabled for this discussion.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.