473,594 Members | 2,749 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Character set specification and server transcoding

Hello,

I am trying to using the Shift_JIS character set in my web pages, and
have specified it as such in the <head>.

<meta http-equiv="Content-Type" content="text/html;
charset=Shift_J IS">

This used to work just fine, but recently I migrated all of my web
pages to a new server. Now I find that when I view the web pages they
are loaded with ISO-8859-1 on all browsers, regardless of what
fonts/OS are installed. So then I have to manually set the encoding
in my browser each time I look at the page, after doing this it looks
all right.

I read the HTML 4.0 spec and the Web Authoring FAQ and noticed that
servers can override character encodings for webpages (transcoding).
Could this be the problem? I have an small, independent host serving
my pages, so I figure I have to be well-informed about the problem as
well as the solution, then I can just tell my administrator what needs
to be done. ("Accept-Charset" HTTP???)

Can someone please comment on this and verify or correct my
assumptions? I would be much obliged.

If I do run into difficulty getting changes made by my server
administration, is there anything in the client web page I can to
ensure the correct character set is automatically used for my text?
Thank you so much!!
Jul 23 '05 #1
4 3014
In article <ff************ **************@ posting.google. com>,
ve*****@hotmail .com (HeroOfSpielbur g) wrote:
I am trying to using the Shift_JIS character set in my web pages, and
have specified it as such in the <head>.

<meta http-equiv="Content-Type" content="text/html;
charset=Shift_J IS">
How user agents should treat this META HTTP-EQUIV is ill-defined at
best[*]. On the Web, the only sensible approach is to instead configure
the server to send the appropriate Content-Type header.
This used to work just fine, but recently I migrated all of my web
pages to a new server. Now I find that when I view the web pages they
are loaded with ISO-8859-1 on all browsers
Configure the server to send the appropriate Content-Type header. How
exactly to do that will of course depend on the server.

If Apache, and if you're not the admin, you should still be able to
configure this. You'd typically create a (text) file ".htaccess" in the
root of your Web directory, which reads

AddType 'text/html; charset=Shift_J IS' html

(Assuming "Shift_JIS" is correct. I don't know, I only use latin-1 and
utf-8.)

See <http://httpd.apache.or g>.

[...]
I read the HTML 4.0 spec and the Web Authoring FAQ and noticed that
servers can override character encodings for webpages (transcoding).
Not "can override", rather "must mention". There are no characters in a
Web page. There are codes. To change the codes into characters, the user
agent looks them up in the appropriate table. Therefore the server must
inform the user agent which table applies to the codes in the current
document. That's what all this 'charset stuff' is about.

[...]
If I do run into difficulty getting changes made by my server
administration, is there anything in the client web page I can to
ensure the correct character set is automatically used for my text?


No. In such a case the only sensible approach would be to change to a
Web server/admin that does offer the bare minimum requirements to run a
reliable Web site. (Many people instead resort to a META HTTP-EQUIV, but
as you've seen that doesn't work. At best, by chance it will sometimes
give the wanted result, by chance.)

[*] As I found when investigating the Navigator bug that Alan Flavell
named the "charset burp":
<http://www.euronet.nl/~tekelenb/WWW/netscapebug/>.

--
Sander Tekelenburg, <http://www.euronet.nl/%7Etekelenb/>
Jul 23 '05 #2
On Sun, 23 Jan 2005, HeroOfSpielburg wrote:
<meta http-equiv="Content-Type" content="text/html;
charset=Shift_J IS">

This used to work just fine, but recently I migrated all of my web
pages to a new server. Now I find that when I view the web pages they
are loaded with ISO-8859-1 on all browsers,
Evidently your new server is set to put charset=iso-8859-1 onto the
HTTP header of outgoing pages by default. That real HTTP header takes
precedence over any meta http-equiv that might be found in the
document itself.

What you'll need to do is find out how to get other charset values put
onto the server's outgoing headers: the details depend on the actual
server configuration that your service provider offers.
regardless of what fonts/OS are installed.
Absolutely! The WWW isn't supposed to display different characters
just because a different font or OS is installed: the most that should
happen is that the characters look cosmetically different.
So then I have to manually set the encoding in my browser each time
I look at the page, after doing this it looks all right.
Yeah, but that's a workaround for the problem - not the proper
solution.
I read the HTML 4.0 spec and the Web Authoring FAQ and noticed that
servers can override character encodings for webpages (transcoding).
Hang on, I think you'd got this a bit confused.

"Transcodin g" means that the server picks up your document encoded in,
let's say, shift_JIS, and translates it into some other encoding,
let's say utf-8 or iso-2022-JP2 or something, and sends the resulting
"translated encoding" (= transcoding) to the client.

In such a situation, it would be absolutely wrong to tell the client
that the document was encoded in shift_JIS, which by now it certainly
isn't.

The classic example of transcoding is Russian Apache, since Russian
has used a number of different incompatible encodings in the past, and
some browsers did not implement all of them. So Russian Apache had
(and presumably still has) an on-the-fly transcoding module, which
takes the author's document in one encoding, and - if necessary -
transcodes it to something that the client is willing to display.

However, I don't think you really want to get involved in that at this
stage.[1]

Instead, take a look at http://www.w3.org/International/O-HTTP-charset

For Apache, what you need are appropriate AddCharset and/or
AddDefaultChars et directives.
I have an small, independent host serving my pages, so I figure I
have to be well-informed about the problem as well as the solution,
then I can just tell my administrator what needs to be done.
("Accept-Charset" HTTP???)
Not exactly, no. Accept-charset is an HTTP request header from the
client, and is something that would be useful if you offered variants
of the same document in different character encodings (whether
statically, or by "transcodin g"). But that's not your present
problem.
If I do run into difficulty getting changes made by my server
administration, is there anything in the client web page I can to
ensure the correct character set is automatically used for my text?


No. As a matter of principle, the HTTP Content-type header takes
priority over other sources of information about character encoding.
See other recent postings on this group about the same topic (and
security alert CA-2000-02 if you want to get technical).

Some browsers may have a user-configurable way of overriding that *as
a workaround for errors*, but you should never try to rely on user
workarounds instead of doing the job properly on the server side.

Have fun.

[1] http://apache.lexa.ru/english/meta-http-eng.html if you're
interested anyway - but that page is a bit old now, it talks about
popular browsers wrongly allowing meta http-equiv to override the
real HTTP header, but modern browsers now usually do what the spec
says they have to do, i.e the real HTTP header takes priority.
Jul 23 '05 #3
Hi, I created the .htaccess file as you suggested and it worked!
Thanks a lot for the help, this is great!! :D

Jul 23 '05 #4
Thanks for the advice. My eyes did glaze over a little when reading
about transcoding the first time. I didn't realize the communication
between a browser and a server was this sophisticated. It definitely
makes me appreciate what goes on behind the page rendering. :)

Jul 23 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
31019
by: Krung Saengpole | last post by:
Hi, I used SQL Server 2000 Personal Edition. I created a stored procedure having input parameters as smallint,tinyint,char,varchar and smalldatetime. When I executed it by Query Analyzer, it's ok. But when I executed it by ASP code that used ADODB.command, it showed error: Invalid character value for cast specification . Though SQL Server debugger, it showed: Invalid character value for cast
6
8817
by: Marco Montel | last post by:
I have two applications that should comunicate through an xml file. This xml will contain a CDATA section with a digital signature. The problem is that the digital signature is composed of special character that are nor correctly recognized by the xml parser. When you try to open the follow file with an xml editor, like jedit, you will see that the CDATA block is marked with follow error:
38
5719
by: Haines Brown | last post by:
I'm having trouble finding the character entity for the French abbreviation for "number" (capital N followed by a small supercript o, period). My references are not listing it. Where would I find an answer to this question (don't find it in the W3C_char_entities document). -- Haines Brown brownh@hartford-hwp.com
9
2043
by: David Dorward | last post by:
I'm sure that I read somewhere that an HTML document might be transcoded to a different characterset at some stage in its journey, so while it might start out as (for example) ISO-8859-15, by the time it is actually viewed its been converted to UTF-8. Maybe by whatever the author used to upload the document to the server, maybe a a proxy, maybe by the user agent (if it saves it to disk), maybe by the httpd in some content negotiation. ...
9
8143
by: Safalra | last post by:
The idea here is relatively simple: a java program (I'm using JDK1.4 if that makes a difference) that loads an HTML file, removes invalid characters (or replaces them in the case of common ones like Microsoft's 'smartquotes'), and outputs the file. The problem is these files will be on disk, so the program won't have the character encoding information from the server. Questions:
7
3784
by: Art M | last post by:
I saved an html page the other day that encoded some punctuation with codes like ‚?T --> apostrophe (in case those characters don't show up in your news reader that's a_circumflex + euro + trademark) --Art
28
3214
by: Xiaotian Sun | last post by:
I added the following line to the header of my html file <meta http-equiv="content-type" content="text/html; charset=utf-8"> hoping browsers will use UTF-8 encoding. But all browsers I tried still use ISO-8859-1. What did I do wrong? Thanks,
2
7775
by: raj | last post by:
what does this error mean? how can i fix this? thanks, raj
2
2084
by: alou131 | last post by:
Hello all! I have this server side video transcoding script that works on all video files uploaded and transcodes them to the .flv format. The problem is when a video file that is already in the.flv format is uploaded, the script tries to transcode it also and an error occurs. Is there a way to have the script not try to transcode .flv files? I would appreciate any help given. use IO::Socket::UNIX; use IO::File; my $vbrate = 64;
0
7941
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, weíll explore What is ONU, What Is Router, ONU & Routerís main usage, and What is the difference between ONU and Router. Letís take a closer look ! Part I. Meaning of...
0
7874
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8246
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8368
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
5738
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Duprť who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
3854
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
2383
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1476
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
1205
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.