Connecting Tech Pros Worldwide Forums | Help | Site Map

Escaping characters in mailto links

Stan Brown
Guest
 
Posts: n/a
#1: Jul 20 '05
Can someone tell me if I am misinterpreting the spec?

Specifically, section 2.4 of RFC 2396
<file://localhost/d:/tech/internet/rfc2396.htm> defines "escaped"
characters using the %xx notation (percent sign and two hex digits).
Section 2.4.2 of the spec says that "%" "always has the reserved
purpose of being the escape indicator".

But when I replace the "i" in a mailto: with %6c, neither Mozilla
1.4 nor MSIE 4 recognizes it as a mailto link. When I replace the
"i" with l, both browsers recognize it.

It seems unlikely both browsers would have the same two errors, but
can someone point out how I'm misreading the spec?

--
Stan Brown, Oak Road Systems, Cortland County, New York, USA
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2 spec: http://www.w3.org/TR/REC-CSS2/
validator: http://jigsaw.w3.org/css-validator/

Jukka K. Korpela
Guest
 
Posts: n/a
#2: Jul 20 '05

re: Escaping characters in mailto links


Stan Brown <the_stan_brown@fastmail.fm> wrote:
[color=blue]
> But when I replace the "i" in a mailto: with %6c, neither Mozilla
> 1.4 nor MSIE 4 recognizes it as a mailto link.[/color]

That's understandable, and correct, because it's not a mailto link any
more, or even a syntactically correct URL.

This is a tricky issue, since RFC 2396 is not exactly crystal clear, and
the prose seems to say that you can encode anything. (Clause 2.3 says:
"Unreserved characters can be escaped without changing the semantics of
the URI, but this should not be done unless the URI is being used in a
context that does not allow the unescaped character to appear.")
But if we read the BNF, available at
http://www.cs.tut.fi/~jkorpela/rfc/2396/full.html#A
in a hopefully more readable form as the original plain text, then it
seems that e.g. the scheme part must _not_ be encoded in any way.
It shall consist of alphanumerics and certain other characters, not
including the percent sign, and the BNF describes the syntax of "uri
character" in a manner that covers URI encoding.
[color=blue]
> When I replace the
> "i" with l, both browsers recognize it.[/color]

Naturally, because character references operate at the HTML parsing
level.

(The obfuscation of URLs is a fairly ineffective weapon against spam, by
the way. And we are and will be under heavy attacks from worms that use
addresses in people's address books, so anything you do in order to "spam
protect" addresses on Web pages or Usenet postings is getting even less
relevant than it was.)

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Closed Thread