473,387 Members | 1,463 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Character set specification and server transcoding

Hello,

I am trying to using the Shift_JIS character set in my web pages, and
have specified it as such in the <head>.

<meta http-equiv="Content-Type" content="text/html;
charset=Shift_JIS">

This used to work just fine, but recently I migrated all of my web
pages to a new server. Now I find that when I view the web pages they
are loaded with ISO-8859-1 on all browsers, regardless of what
fonts/OS are installed. So then I have to manually set the encoding
in my browser each time I look at the page, after doing this it looks
all right.

I read the HTML 4.0 spec and the Web Authoring FAQ and noticed that
servers can override character encodings for webpages (transcoding).
Could this be the problem? I have an small, independent host serving
my pages, so I figure I have to be well-informed about the problem as
well as the solution, then I can just tell my administrator what needs
to be done. ("Accept-Charset" HTTP???)

Can someone please comment on this and verify or correct my
assumptions? I would be much obliged.

If I do run into difficulty getting changes made by my server
administration, is there anything in the client web page I can to
ensure the correct character set is automatically used for my text?
Thank you so much!!
Jul 23 '05 #1
4 3001
In article <ff**************************@posting.google.com >,
ve*****@hotmail.com (HeroOfSpielburg) wrote:
I am trying to using the Shift_JIS character set in my web pages, and
have specified it as such in the <head>.

<meta http-equiv="Content-Type" content="text/html;
charset=Shift_JIS">
How user agents should treat this META HTTP-EQUIV is ill-defined at
best[*]. On the Web, the only sensible approach is to instead configure
the server to send the appropriate Content-Type header.
This used to work just fine, but recently I migrated all of my web
pages to a new server. Now I find that when I view the web pages they
are loaded with ISO-8859-1 on all browsers
Configure the server to send the appropriate Content-Type header. How
exactly to do that will of course depend on the server.

If Apache, and if you're not the admin, you should still be able to
configure this. You'd typically create a (text) file ".htaccess" in the
root of your Web directory, which reads

AddType 'text/html; charset=Shift_JIS' html

(Assuming "Shift_JIS" is correct. I don't know, I only use latin-1 and
utf-8.)

See <http://httpd.apache.org>.

[...]
I read the HTML 4.0 spec and the Web Authoring FAQ and noticed that
servers can override character encodings for webpages (transcoding).
Not "can override", rather "must mention". There are no characters in a
Web page. There are codes. To change the codes into characters, the user
agent looks them up in the appropriate table. Therefore the server must
inform the user agent which table applies to the codes in the current
document. That's what all this 'charset stuff' is about.

[...]
If I do run into difficulty getting changes made by my server
administration, is there anything in the client web page I can to
ensure the correct character set is automatically used for my text?


No. In such a case the only sensible approach would be to change to a
Web server/admin that does offer the bare minimum requirements to run a
reliable Web site. (Many people instead resort to a META HTTP-EQUIV, but
as you've seen that doesn't work. At best, by chance it will sometimes
give the wanted result, by chance.)

[*] As I found when investigating the Navigator bug that Alan Flavell
named the "charset burp":
<http://www.euronet.nl/~tekelenb/WWW/netscapebug/>.

--
Sander Tekelenburg, <http://www.euronet.nl/%7Etekelenb/>
Jul 23 '05 #2
On Sun, 23 Jan 2005, HeroOfSpielburg wrote:
<meta http-equiv="Content-Type" content="text/html;
charset=Shift_JIS">

This used to work just fine, but recently I migrated all of my web
pages to a new server. Now I find that when I view the web pages they
are loaded with ISO-8859-1 on all browsers,
Evidently your new server is set to put charset=iso-8859-1 onto the
HTTP header of outgoing pages by default. That real HTTP header takes
precedence over any meta http-equiv that might be found in the
document itself.

What you'll need to do is find out how to get other charset values put
onto the server's outgoing headers: the details depend on the actual
server configuration that your service provider offers.
regardless of what fonts/OS are installed.
Absolutely! The WWW isn't supposed to display different characters
just because a different font or OS is installed: the most that should
happen is that the characters look cosmetically different.
So then I have to manually set the encoding in my browser each time
I look at the page, after doing this it looks all right.
Yeah, but that's a workaround for the problem - not the proper
solution.
I read the HTML 4.0 spec and the Web Authoring FAQ and noticed that
servers can override character encodings for webpages (transcoding).
Hang on, I think you'd got this a bit confused.

"Transcoding" means that the server picks up your document encoded in,
let's say, shift_JIS, and translates it into some other encoding,
let's say utf-8 or iso-2022-JP2 or something, and sends the resulting
"translated encoding" (= transcoding) to the client.

In such a situation, it would be absolutely wrong to tell the client
that the document was encoded in shift_JIS, which by now it certainly
isn't.

The classic example of transcoding is Russian Apache, since Russian
has used a number of different incompatible encodings in the past, and
some browsers did not implement all of them. So Russian Apache had
(and presumably still has) an on-the-fly transcoding module, which
takes the author's document in one encoding, and - if necessary -
transcodes it to something that the client is willing to display.

However, I don't think you really want to get involved in that at this
stage.[1]

Instead, take a look at http://www.w3.org/International/O-HTTP-charset

For Apache, what you need are appropriate AddCharset and/or
AddDefaultCharset directives.
I have an small, independent host serving my pages, so I figure I
have to be well-informed about the problem as well as the solution,
then I can just tell my administrator what needs to be done.
("Accept-Charset" HTTP???)
Not exactly, no. Accept-charset is an HTTP request header from the
client, and is something that would be useful if you offered variants
of the same document in different character encodings (whether
statically, or by "transcoding"). But that's not your present
problem.
If I do run into difficulty getting changes made by my server
administration, is there anything in the client web page I can to
ensure the correct character set is automatically used for my text?


No. As a matter of principle, the HTTP Content-type header takes
priority over other sources of information about character encoding.
See other recent postings on this group about the same topic (and
security alert CA-2000-02 if you want to get technical).

Some browsers may have a user-configurable way of overriding that *as
a workaround for errors*, but you should never try to rely on user
workarounds instead of doing the job properly on the server side.

Have fun.

[1] http://apache.lexa.ru/english/meta-http-eng.html if you're
interested anyway - but that page is a bit old now, it talks about
popular browsers wrongly allowing meta http-equiv to override the
real HTTP header, but modern browsers now usually do what the spec
says they have to do, i.e the real HTTP header takes priority.
Jul 23 '05 #3
Hi, I created the .htaccess file as you suggested and it worked!
Thanks a lot for the help, this is great!! :D

Jul 23 '05 #4
Thanks for the advice. My eyes did glaze over a little when reading
about transcoding the first time. I didn't realize the communication
between a browser and a server was this sophisticated. It definitely
makes me appreciate what goes on behind the page rendering. :)

Jul 23 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Krung Saengpole | last post by:
Hi, I used SQL Server 2000 Personal Edition. I created a stored procedure having input parameters as smallint,tinyint,char,varchar and smalldatetime. When I executed it by Query Analyzer, it's...
6
by: Marco Montel | last post by:
I have two applications that should comunicate through an xml file. This xml will contain a CDATA section with a digital signature. The problem is that the digital signature is composed of...
38
by: Haines Brown | last post by:
I'm having trouble finding the character entity for the French abbreviation for "number" (capital N followed by a small supercript o, period). My references are not listing it. Where would I...
9
by: David Dorward | last post by:
I'm sure that I read somewhere that an HTML document might be transcoded to a different characterset at some stage in its journey, so while it might start out as (for example) ISO-8859-15, by the...
9
by: Safalra | last post by:
The idea here is relatively simple: a java program (I'm using JDK1.4 if that makes a difference) that loads an HTML file, removes invalid characters (or replaces them in the case of common ones...
7
by: Art M | last post by:
I saved an html page the other day that encoded some punctuation with codes like â?T --> apostrophe (in case those characters don't show up in your news reader that's a_circumflex + euro +...
28
by: Xiaotian Sun | last post by:
I added the following line to the header of my html file <meta http-equiv="content-type" content="text/html; charset=utf-8"> hoping browsers will use UTF-8 encoding. But all browsers I tried...
2
by: raj | last post by:
what does this error mean? how can i fix this? thanks, raj
2
by: alou131 | last post by:
Hello all! I have this server side video transcoding script that works on all video files uploaded and transcodes them to the .flv format. The problem is when a video file that is already in...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.