By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,117 Members | 2,142 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,117 IT Pros & Developers. It's quick & easy.

Multiviews and multilanguage content (and search engines!)

P: n/a
Hi,

I have my site available in a couple of languages most of the files
are indexf.html (french), indexi.html (italian) and indexde.html
(german)..

I'd like to start using multiviews but I've noticed that search
engines don't index index.html.en but they do index index.en.html.

My question is, how can I configure multiviews to deliver for example
indexf.html instead of index.html.fr? It's simpler for me to do that
than to change the whole structure of the site, besides people have
already their links pointing to those addresses. I've tried a couple
of thing and it didn't worked.

Cordially,
Stromboli

Ps. The other think I don't understand is why being Apache one of the
most popular webservers, search engines don't index apache's
multiviews pages? it's odd.
Jul 23 '05 #1
Share this Question
Share on Google+
8 Replies


P: n/a
Stromboli wrote:
Hi,

I have my site available in a couple of languages most of the files
are indexf.html (french), indexi.html (italian) and indexde.html
(german)..
Use the standard index.lang.html file naming convention, where lang
represents the standard ISO-639 language code and possibly the ISO-3166
country code, if applicable.
eg.
index.fr.html
index.it.html
index.de.html

or

index.zh-hk.html
(this example uses the ISO-3166 country code for Hong Kong)
I'd like to start using multiviews but I've noticed that search
engines don't index index.html.en but they do index index.en.html.
You should look up the multiviews documentation and then ask in an
Apache newsgroup. However, just because I'm in a good mood, I'll tell
you. Add this to your .htaccess file.

Options Multiviews
AddLanguage de .de
AddLanguage fr .fr
AddLanguage it .it

Or like this if you're using a country code as well.
AddLanguage zh-HK .zh-hk
My question is, how can I configure multiviews to deliver for example
indexf.html instead of index.html.fr?
index.html.fr is better than indexf.html, but index.fr.html is more
convenient, because no editor that I know of, nor Windows recognises
multiple file extensions properly. Thus, using .fr.html allows your
editor to correctly recognise it as an html file.

You need to use a .lang extension for Apache to correctly recognise the
language so the content negotiation will work correctly, and the correct
Content-Language header will be sent.
Ps. I don't understand why ... search engines don't index apache's
multiviews pages?


Who, or what gives you the impression that they don't? All 16
languages of this Firefox campaign [1] I'm working on have been indexed
correctly. Anyway, that question is also not appropriate for an HTML
newsgroup, try an Apache or general server newsgroup.

[1] http://lachy.id.au/dev/mozilla/firef...nute/challenge

--
Lachlan Hunt
http://lachy.id.au/
http://GetFirefox.com/ Rediscover the Web
http://SpreadFirefox.com/ Igniting the Web
Jul 23 '05 #2

P: n/a
st********@hotpop.com (Stromboli) wrote:
I have my site available in a couple of languages most of the files
are indexf.html (french), indexi.html (italian) and indexde.html
(german)..
A somewhat illogical naming scheme. Maybe you could consider adopting a
more systematic scheme, like the one you are referring to. You could then
just define permanent redirects (via .htaccess on Apache) for the old
names, so that people's links and bookmarks using them would still work.
I'd like to start using multiviews but I've noticed that search
engines don't index index.html.en but they do index index.en.html.


I haven't such a problem, but surely you can use index.en.html too.

On the other hand, you could use the type-map mechanism too, see
http://www.cs.tut.fi/~jkorpela/multi/6.var
This would let you keep using the current names as such, with the cost of
needing to write a .var file for each logical page (with "logical page"
defined as a set of identical-content pages in different languages).

But as Lachlan Hunt mentioned, this isn't really an HTML matter. Just
remember to have explicit links to the other versions, no matter what you
do in language negotiation. Language negotiation very often fails, in
practical terms, since the languages preferences in the user's browser do
not reflect his real language skills and preferences.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 23 '05 #3

P: n/a
On Sun, 24 Oct 2004, Lachlan Hunt wrote:
index.html.fr is better than indexf.html, but index.fr.html is more
convenient, because no editor that I know of, nor Windows recognises
multiple file extensions properly.
In that sense, I'd agree with you; but the flip side is that the user
cannot then reference index.html (to choose the HTML variant as
opposed to, say, .txt, .xhtml, .pdf, variants, whatever) and have the
MultiViews mechanism supply the language via preferences.
You need to use a .lang extension for Apache to correctly recognise
the language so the content negotiation will work correctly, and the
correct Content-Language header will be sent.


Yes... Unless, of course, you use a type-map instead of MultiViews.

Someone with a bit of Perl expertise could probably write a script to
be run whenever the web site is updated (e.g from a Makefile), to
trawl the filenames and create the desired type-map file, for anyone
who finds the MultiViews mechanism somehow inadequate to their needs.

Jul 23 '05 #4

P: n/a
Stromboli wrote:
I'd like to start using multiviews but I've noticed that search
engines don't index index.html.en but they do index index.en.html.


Search engines have no way of knowing what filenames your resources have
at your end, except to the extent you expose this through the URLs used
in your links. If the file is "index.html.en", but is referenced in
your site only as "index.html" or as a raw directory reference ending in
a trailing slash, so that your server performs the appropriate language
negotiation and ends up serving the resource from "index.html.en", then
it'll be indexed just as well (or poorly) as anything else, under any
other filename, that might otherwise be served as the default index
document of that directory.

More info on language coding and negotiation:
http://webtips.dan.info/language.html

--
== Dan ==
Dan's Mail Format Site: http://mailformat.dan.info/
Dan's Web Tips: http://webtips.dan.info/
Dan's Domain Site: http://domains.dan.info/
Jul 23 '05 #5

P: n/a
On Sun, 24 Oct 2004, Daniel R. Tobias wrote:
Stromboli wrote:
I'd like to start using multiviews but I've noticed that search
engines don't index index.html.en but they do index index.en.html.


Search engines have no way of knowing what filenames your resources
have at your end, except to the extent you expose this through the
URLs used in your links.


By and large you would be right, for sure; but there's at least two
comments I could make to that.

One is that of course there *could* be search engines that don't
follow the principles of HTTP, and assign a significance to the local
part of the URL (and in particular to its apparent filename
"extension") instead of looking for its MIME content-type as HTTP
calls on them to do. To that extent, any such search engine would be
defective, agreed.

The other is that the HTTP response to negotiated content often has
the "real" URL (for some value of "real") included as a response
header. We might for example take this one:

$ lynx -head -dump http://ppewww.ph.gla.ac.uk/~flavell/charset/quick
HTTP/1.1 200 OK
Date: Sun, 24 Oct 2004 18:34:21 GMT
Server: Apache/1.3.26 (Unix) PHP/4.2.2
Content-Location: quick.en.html
Vary: negotiate,accept-language,accept-charset
[...]

which reveals to anyone interested that although retrieved by the
URL "quick", the negotiation actually resulted in delivery of
the URL "quick.en.html".

As it happens, the search engine would find href= references to both
of those URLs on my site, so it may be hard to distinguish just how
the engine works. I suppose a diagnostic test could be made by anyone
sufficiently interested (but of course, what a search engine does this
week doesn't necessarily commit it to do the same next week).

[...snip bits with which I had no argument...]

all the best
Jul 23 '05 #6

P: n/a
On Sun, 24 Oct 2004 19:41:27 +0100, "Alan J. Flavell"
<fl*****@ph.gla.ac.uk> wrote:
One is that of course there *could* be search engines that don't
follow the principles of HTTP, and assign a significance to the local
part of the URL (and in particular to its apparent filename
"extension") instead of looking for its MIME content-type as HTTP
calls on them to do. To that extent, any such search engine would be
defective, agreed.


Google is defective in this area - filetype:XYZ will demonstrate it.

Jim.
--
comp.lang.javascript FAQ - http://jibbering.com/faq/

Jul 23 '05 #7

P: n/a
On Sun, 24 Oct 2004, Alan J. Flavell wrote:
One is that of course there *could* be search engines that don't
follow the principles of HTTP, and assign a significance to the local
part of the URL (and in particular to its apparent filename
"extension") instead of looking for its MIME content-type as HTTP
calls on them to do. To that extent, any such search engine would be
defective, agreed.


Only recently, Google has begun to index my *.mac and *.win documents.
<http://www.google.com/groups?th=76ed8f112c413a50>
<http://www.google.com/groups?th=797df1a7b2bddf26>

--
Top-posting.
What's the most irritating thing on Usenet?

Jul 23 '05 #8

P: n/a
On Sun, 24 Oct 2004, Jukka K. Korpela wrote:
I'd like to start using multiviews but I've noticed that search
engines don't index index.html.en but they do index index.en.html.


I haven't such a problem,


But you _had_ a problem with *.htm8 not being indexed by Google.

--
Top-posting.
What's the most irritating thing on Usenet?

Jul 23 '05 #9

This discussion thread is closed

Replies have been disabled for this discussion.