I recently discovered that the web server I use has started to specify
Latin-1 as the default charset, with the result that my Greek, Russian,
Persian, etc pages failed to display properly. I had previously used
the deprecated <META ... charset ...> header tags, which worked for a
time -- presumably because the server didn't originally specify a
default charset.
My learning curve over the last few days has been quite steep: thank
you, Alan, Jukka et al (how are things, Al?) for your useful & clearly
expressed postings on this topic.
I had assumed -- erroneously -- that charset/encoding instructions
acted something like CSSs, with specifications on a webpage overriding
any centrally-specified default.
FWIW, & in the hope that it may be useful for someone in the same
position, here is the (Apache) .htaccess file I finally came up with:
AddCharset UTF-8 .htm
<Files ~ "^g(reek|s|c).+\.htm$">
AddCharset Windows-1253 .htm
</Files>
<Files ~ "^ro.+\.htm$">
AddCharset Windows-1250 .htm
</Files>
<Files ~ "^ru?s.+\.htm$">
AddCharset Windows-1251 .htm
</Files>
<Files ~ "^t(ur|s).+\.htm$">
AddCharset Windows-1254 .htm
</Files>
It looks a bit messy, & if I were starting from scratch I would have
organized the files into language folders. But the file may be of
interest as a sort of template. Briefly, for the benefit of anyone
unfamiliar with the format:
1. I start by making UTF-8 the default encoding.
2. I specify the encodings for Greek, Romanian, Russian and Turkish, in
that order.
3. I use regular expressions to cover the file names for each language
(of course these should have been rationalized, but I didn't want to
have to rewrite hundreds of links!).
HTH someone ...
Nigel
--
ScriptMaster language resources (Chinese/Modern & Classical
Greek/IPA/Persian/Russian/Turkish):
http://www.elgin.free-online.co.uk