"Joerg Jooss" <jo*********@gmx.net> wrote in message
news:ep**************@TK2MSFTNGP12.phx.gbl...
Check out this very nice article:
http://www.joelonsoftware.com/articles/Unicode.html
That is, indeed, a very nice article. It does have one problem, however. It
implies that using UTF-8 for all web pages is an OK thing to do because most
browsers have UTF-8 support. This is true, but UTF-8 causes huge bloat in
the byte count for some languages. Chinese is a great example. In my opinion
page size still matters, and you can greatly optimize the page size in many
cases if you customize the encoding to match the primary language of the
page.
For example, a typical block of Chinese text will take three times as much
space in UTF-8 as it will using Big5. Characters that don't exist in Big5
can be encoded as &# entities. Browsers that people use to read Chinese are
very likely to support Big5, so in my opinion you should use Big5 encoding
for Chinese pages. ASP.NET makes this very easy to do. This conserves
Internet bandwidth, saves space in proxy servers, saves space in your local
cache, reduces download times for those unfortunate modem and ISDN users,
etc.
In web pages there's going to be a lot of ASCII characters (HTML tags and so
forth) mixed in with the Chinese, so your actual savings will be less than
3-to-1, but for the bulk of Chinese content pages there will be a
significant savings. I'm just using Chinese as an example--pick any
non-European language and the result will likely be similar.