On Sun, 24 Oct 2004, Daniel R. Tobias wrote:
Stromboli wrote:
I'd like to start using multiviews but I've noticed that search
engines don't index index.html.en but they do index index.en.html.
Search engines have no way of knowing what filenames your resources
have at your end, except to the extent you expose this through the
URLs used in your links.
By and large you would be right, for sure; but there's at least two
comments I could make to that.
One is that of course there *could* be search engines that don't
follow the principles of HTTP, and assign a significance to the local
part of the URL (and in particular to its apparent filename
"extension") instead of looking for its MIME content-type as HTTP
calls on them to do. To that extent, any such search engine would be
defective, agreed.
The other is that the HTTP response to negotiated content often has
the "real" URL (for some value of "real") included as a response
header. We might for example take this one:
$ lynx -head -dump
http://ppewww.ph.gla.ac.uk/~flavell/charset/quick
HTTP/1.1 200 OK
Date: Sun, 24 Oct 2004 18:34:21 GMT
Server: Apache/1.3.26 (Unix) PHP/4.2.2
Content-Location: quick.en.html
Vary: negotiate,accept-language,accept-charset
[...]
which reveals to anyone interested that although retrieved by the
URL "quick", the negotiation actually resulted in delivery of
the URL "quick.en.html".
As it happens, the search engine would find href= references to both
of those URLs on my site, so it may be hard to distinguish just how
the engine works. I suppose a diagnostic test could be made by anyone
sufficiently interested (but of course, what a search engine does this
week doesn't necessarily commit it to do the same next week).
[...snip bits with which I had no argument...]
all the best