By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,143 Members | 1,908 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,143 IT Pros & Developers. It's quick & easy.

Character set specification and server transcoding

P: n/a
Hello,

I am trying to using the Shift_JIS character set in my web pages, and
have specified it as such in the <head>.

<meta http-equiv="Content-Type" content="text/html;
charset=Shift_JIS">

This used to work just fine, but recently I migrated all of my web
pages to a new server. Now I find that when I view the web pages they
are loaded with ISO-8859-1 on all browsers, regardless of what
fonts/OS are installed. So then I have to manually set the encoding
in my browser each time I look at the page, after doing this it looks
all right.

I read the HTML 4.0 spec and the Web Authoring FAQ and noticed that
servers can override character encodings for webpages (transcoding).
Could this be the problem? I have an small, independent host serving
my pages, so I figure I have to be well-informed about the problem as
well as the solution, then I can just tell my administrator what needs
to be done. ("Accept-Charset" HTTP???)

Can someone please comment on this and verify or correct my
assumptions? I would be much obliged.

If I do run into difficulty getting changes made by my server
administration, is there anything in the client web page I can to
ensure the correct character set is automatically used for my text?
Thank you so much!!
Jul 23 '05 #1
Share this Question
Share on Google+
4 Replies


P: n/a
In article <ff**************************@posting.google.com >,
ve*****@hotmail.com (HeroOfSpielburg) wrote:
I am trying to using the Shift_JIS character set in my web pages, and
have specified it as such in the <head>.

<meta http-equiv="Content-Type" content="text/html;
charset=Shift_JIS">
How user agents should treat this META HTTP-EQUIV is ill-defined at
best[*]. On the Web, the only sensible approach is to instead configure
the server to send the appropriate Content-Type header.
This used to work just fine, but recently I migrated all of my web
pages to a new server. Now I find that when I view the web pages they
are loaded with ISO-8859-1 on all browsers
Configure the server to send the appropriate Content-Type header. How
exactly to do that will of course depend on the server.

If Apache, and if you're not the admin, you should still be able to
configure this. You'd typically create a (text) file ".htaccess" in the
root of your Web directory, which reads

AddType 'text/html; charset=Shift_JIS' html

(Assuming "Shift_JIS" is correct. I don't know, I only use latin-1 and
utf-8.)

See <http://httpd.apache.org>.

[...]
I read the HTML 4.0 spec and the Web Authoring FAQ and noticed that
servers can override character encodings for webpages (transcoding).
Not "can override", rather "must mention". There are no characters in a
Web page. There are codes. To change the codes into characters, the user
agent looks them up in the appropriate table. Therefore the server must
inform the user agent which table applies to the codes in the current
document. That's what all this 'charset stuff' is about.

[...]
If I do run into difficulty getting changes made by my server
administration, is there anything in the client web page I can to
ensure the correct character set is automatically used for my text?


No. In such a case the only sensible approach would be to change to a
Web server/admin that does offer the bare minimum requirements to run a
reliable Web site. (Many people instead resort to a META HTTP-EQUIV, but
as you've seen that doesn't work. At best, by chance it will sometimes
give the wanted result, by chance.)

[*] As I found when investigating the Navigator bug that Alan Flavell
named the "charset burp":
<http://www.euronet.nl/~tekelenb/WWW/netscapebug/>.

--
Sander Tekelenburg, <http://www.euronet.nl/%7Etekelenb/>
Jul 23 '05 #2

P: n/a
On Sun, 23 Jan 2005, HeroOfSpielburg wrote:
<meta http-equiv="Content-Type" content="text/html;
charset=Shift_JIS">

This used to work just fine, but recently I migrated all of my web
pages to a new server. Now I find that when I view the web pages they
are loaded with ISO-8859-1 on all browsers,
Evidently your new server is set to put charset=iso-8859-1 onto the
HTTP header of outgoing pages by default. That real HTTP header takes
precedence over any meta http-equiv that might be found in the
document itself.

What you'll need to do is find out how to get other charset values put
onto the server's outgoing headers: the details depend on the actual
server configuration that your service provider offers.
regardless of what fonts/OS are installed.
Absolutely! The WWW isn't supposed to display different characters
just because a different font or OS is installed: the most that should
happen is that the characters look cosmetically different.
So then I have to manually set the encoding in my browser each time
I look at the page, after doing this it looks all right.
Yeah, but that's a workaround for the problem - not the proper
solution.
I read the HTML 4.0 spec and the Web Authoring FAQ and noticed that
servers can override character encodings for webpages (transcoding).
Hang on, I think you'd got this a bit confused.

"Transcoding" means that the server picks up your document encoded in,
let's say, shift_JIS, and translates it into some other encoding,
let's say utf-8 or iso-2022-JP2 or something, and sends the resulting
"translated encoding" (= transcoding) to the client.

In such a situation, it would be absolutely wrong to tell the client
that the document was encoded in shift_JIS, which by now it certainly
isn't.

The classic example of transcoding is Russian Apache, since Russian
has used a number of different incompatible encodings in the past, and
some browsers did not implement all of them. So Russian Apache had
(and presumably still has) an on-the-fly transcoding module, which
takes the author's document in one encoding, and - if necessary -
transcodes it to something that the client is willing to display.

However, I don't think you really want to get involved in that at this
stage.[1]

Instead, take a look at http://www.w3.org/International/O-HTTP-charset

For Apache, what you need are appropriate AddCharset and/or
AddDefaultCharset directives.
I have an small, independent host serving my pages, so I figure I
have to be well-informed about the problem as well as the solution,
then I can just tell my administrator what needs to be done.
("Accept-Charset" HTTP???)
Not exactly, no. Accept-charset is an HTTP request header from the
client, and is something that would be useful if you offered variants
of the same document in different character encodings (whether
statically, or by "transcoding"). But that's not your present
problem.
If I do run into difficulty getting changes made by my server
administration, is there anything in the client web page I can to
ensure the correct character set is automatically used for my text?


No. As a matter of principle, the HTTP Content-type header takes
priority over other sources of information about character encoding.
See other recent postings on this group about the same topic (and
security alert CA-2000-02 if you want to get technical).

Some browsers may have a user-configurable way of overriding that *as
a workaround for errors*, but you should never try to rely on user
workarounds instead of doing the job properly on the server side.

Have fun.

[1] http://apache.lexa.ru/english/meta-http-eng.html if you're
interested anyway - but that page is a bit old now, it talks about
popular browsers wrongly allowing meta http-equiv to override the
real HTTP header, but modern browsers now usually do what the spec
says they have to do, i.e the real HTTP header takes priority.
Jul 23 '05 #3

P: n/a
Hi, I created the .htaccess file as you suggested and it worked!
Thanks a lot for the help, this is great!! :D

Jul 23 '05 #4

P: n/a
Thanks for the advice. My eyes did glaze over a little when reading
about transcoding the first time. I didn't realize the communication
between a browser and a server was this sophisticated. It definitely
makes me appreciate what goes on behind the page rendering. :)

Jul 23 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.