By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,321 Members | 1,875 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,321 IT Pros & Developers. It's quick & easy.

what is the recommended "lang" for Chinese simplified?

P: n/a
Hello. I am working with a php software project, in it
(www.egroupware.org) Chinese simplified locate is "zh" while Traditional
Chinese "tw".

I wish to send correct language attribute in http header, I found "zh"
is not standard. I found this line in apache2's default httpd.conf

# Simplified Chinese (zh-CN)
AddLanguage zh-CN .zh-cn

So it seems zh-CN is the correct language attribute to send. But mozilla
seems to refuse it:

body:lang(zh-CN) { font-size: 14pt}

This line always make the body font 14pt even if lang in http header is
set to "en". But

body:lang(cn) { font-size: 14pt}

is recognized by mozilla. This line only make the body font 14pt when
the header is lang=cn.

So apache suggest "zh-CN", mozilla use "cn". I happen to know a few
others, say "zh_CN" "zh_EUC" and "zh-EUC". Today I read w3c's suggestion:
http://www.w3.org/International/ques...-css-lang.html

there is a:
<li xml:lang="zh-Hans" lang="zh-Hans">»ΆΣ*</li>

"zh-Hans" is new to me: neither apache nor mozilla seems to know it.

So I'm puzzled: is there something I can rely on? Should I use "cn" all
the way or not?

Please point me where to find related standard.

Jul 20 '05 #1
Share this Question
Share on Google+
6 Replies


P: n/a
Zhang Weiwu <zh********@realss.com> wrote:
Hello. I am working with a php software project, in it
(www.egroupware.org) Chinese simplified locate is "zh" while
Traditional Chinese "tw".
I presume you refer to locale names here. Locales are a world of their
own, and many people think that they are a wrong approach to the
problems of cultural diversity. HTML itself has nothing to do with
locales, even though some locale names may coincide with values of the
lang="..." attribute.
I wish to send correct language attribute in http header,
That's not of much use, but if you do so, the correct values are
ISO 639 language codes optionally followed by a subcode. Actually the
details are a bit muddy, since the HTTP protocol definition refers to
RFC 1766, which has now been obsoleted by RFC 3066 and RFC 3282.
But apparently the latter is to be taken as overriding HTTP/1.1
specification if there is a conflict, since it explicitly defines the
Content-Language header.
I found "zh" is not standard.
It definitely is the standard ISO 639 code for the Chinese language and
the one that shall be used both in lang="..." attributes in HTML and in
Content-Language headers in HTTP. Whether it is followed by a subcode
is a different issue, and so is the fairly complex question what really
constitutes "the Chinese language" here, but apparently it is to be
understood in a very broad sense (not limited to putonghua).
I found this line in apache2's default
httpd.conf

# Simplified Chinese (zh-CN)
AddLanguage zh-CN .zh-cn

So it seems zh-CN is the correct language attribute to send.
The Apache default configuration has no authoritative role. It is
something that should comply with specifications, _be_ correct by the
specs, not try to define what is correct.

But zh-CN is _a_ correct value for the Content-Language header.
It specifies the particular variant of Chinese spoken in China (country
code CN), though this probably raises more questions that it can
answer. This is very confusing, since what people probably _mean_ when
use that code is "traditional Chinese" _writing system_, and for such
purposes, there are also some IANA registered subcodes, see
http://www.iana.org/assignments/language-tags
which lists, among others, "zh-Hans" defined as 'Chinese, in simplified
script'. But the fact that it is registered properly according the
procedures set up in the relevant RFCs doesn't mean that it would used
and useable in practice.

Whether Content-Language should specify "zh-CN" is debatable. According
to the HTTP protocol, "The Content-Language entity-header field
describes the natural language(s) of the intended audience for the
enclosed entity" (i.e., of the document sent). Note that it is not
defined as the language of the document. The distinction is subtle, but
it becomes important when subcodes are included.

Content-Language: zh-CN
says, by the protocol, that the document is intended for people who
understand Chinese in the form used in China. It is beyond my
competence to decide whether this would be adequate, but I think I do
know that it would be incorrect to send, for example,
Content-Language: en-GB
except perhaps in very special cases. Surely people who speak, for
example, the "standard" US version of English would reasonably
understand British English as well.

So normally the Content-Language header, if used, should only specify
the major language code, such as zh or en.
But
mozilla seems to refuse it:

body:lang(zh-CN) { font-size: 14pt}
That construct is a CSS rule, not HTML or HTTP at all, though it may
relate to both:
"If the document language specifies how the human language of an
element is determined, it is possible to write selectors in CSS that
match an element based on its language. For example, in HTML [HTML40],
the language is determined by a combination of the "lang" attribute,
the META element, and possibly by information from the protocol (such
as HTTP headers). XML uses an attribute called xml:lang, and there may
be other document language-specific methods for determining the
language."
http://www.w3.org/TR/REC-CSS2/selector.html#lang
This is rather vague - it does not really define whether a browser
_shall_, _should_, or _may_ use HTTP headers to define what elements
the :lang(...) selector matches, but clearly it is meant that the
language specified in lang="..." attributes in HTML takes precedence.

(Besides, the rule tries to enforce a fixed font size for the body of a
document, which is almost always a poor idea on the Web. But this is a
different can of worms.)
This line always make the body font 14pt even if lang in http
header is set to "en".
That sounds like a Mozilla bug. It's really off-topic in this group,
since it's about a browser's CSS implementation.
But

body:lang(cn) { font-size: 14pt}

is recognized by mozilla. This line only make the body font 14pt
when the header is lang=cn.
Header lang=cn? Is this about HTTP headers, or about the lang="..."
attribute in HTML? In either case though, the code "cn" is definitely
incorrect. There is no language code "cn" assigned in ISO 639, and it
must not be used by private agreement either, since all two-letter
language codes are reserved for allocation by the ISO.

But a browser is not expected to know such things in practice. It
treats the language code just as a string, though it may recognize some
codes and do something sensible based on its knowing what the language
of the text is (although this is mostly just wishful thinking).
I happen to know a few
others, say "zh_CN" "zh_EUC" and "zh-EUC".
Welcome to the meta-Babel of language codes. Language codes are well
standardized: every group has its own standards or "standards".
Compared to that, HTML and HTTP have fairly fixed rules what the codes
mean and what codes shall be used. Too bad so few programs behave by
the rules.
Today I read w3c's
suggestion:
http://www.w3.org/International/questions/qa-css-lang.html
Beware that's it's mostly just wishful thinking. For example, IE, the
dominant browser, knows nothing about any CSS selectors used there, and
this situation should not be expected to change in the next few years.

Note that the document is descriptive, not normative. As far as I can
see, it complies with W3C recommendations, which is not surprising.
And it seems to paint a correct picture about the situation with
Chinese, except that it does not quite explicitly say this:
zh-CN and zh-TW are not the correct way to indicate writing system,
but they are what some browsers recognize, whereas the correct way
is ignored by browsers.
"zh-Hans" is new to me: neither apache nor mozilla seems to know
it.
It is registered by IANA. But as you see, browser (and server) vendors
didn't notice this any more than most of us have.
So I'm puzzled: is there something I can rely on?
No.
Should I use "cn" all the way or not?


Using <html lang="cn"> is correct if your document is in Chinese. It
won't help much (if anything) at present, but neither should it cause
problems.

If you use <html lang="cn-CN">, then some browsers will select
simplified Chinese glyphs, which is probably what you want then,
although this is hardly the theoretically correct way to go.
Similarly, to get traditional Chinese glyphs, you could use
<html lang="cn-TW">, and this would work on some browsers.

All the rest is probably just futile at present.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 20 '05 #2

P: n/a
Zhang Weiwu <zh********@realss.com> wrote in message news:<c0**********@mail.cn99.com>...
body:lang(cn) { font-size: 14pt}

is recognized by mozilla. This line only make the body font 14pt when
the header is lang=cn.
There is no language identified as 'cn'. I presume it still works if
you correct it to lang="zh"?
Today I read w3c's suggestion:
http://www.w3.org/International/ques...-css-lang.html
there is a:
<li xml:lang="zh-Hans" lang="zh-Hans">»ΆΣ*</li>

"zh-Hans" is new to me: neither apache nor mozilla seems to know it.


zh-hans is the new code for simplified Chinese, and zh-hant is the new
code for traditional chinese. In the ideal world these would be what
you'd use, but many more programs recognise the old codes that the new
codes.

I think you'll have more success for the moment using zh-cn and zh-hk
for simplified and traditional chinese respectively.

Sorry I can't help on the Apache issue.

--- Safalra (Stephen Morley) ---
http://www.safalra.com/hypertext
Jul 20 '05 #3

P: n/a
DU
Zhang Weiwu wrote:
Hello. I am working with a php software project, in it
(www.egroupware.org) Chinese simplified locate is "zh" while Traditional
Chinese "tw".

I wish to send correct language attribute in http header, I found "zh"
is not standard. I found this line in apache2's default httpd.conf

# Simplified Chinese (zh-CN)
AddLanguage zh-CN .zh-cn

So it seems zh-CN is the correct language attribute to send. But mozilla
seems to refuse it:

body:lang(zh-CN) { font-size: 14pt}

This line always make the body font 14pt even if lang in http header is
set to "en". But

body:lang(cn) { font-size: 14pt}

is recognized by mozilla. This line only make the body font 14pt when
the header is lang=cn.

I strongly advise you avoid absolute font-size: I suggest you use % or
em instead so that the page text can be scalable in MSIE browsers.
So apache suggest "zh-CN", mozilla use "cn". I happen to know a few
others, say "zh_CN" "zh_EUC" and "zh-EUC". Today I read w3c's suggestion:
http://www.w3.org/International/ques...-css-lang.html

there is a:
<li xml:lang="zh-Hans" lang="zh-Hans">»ΆΣ*</li>

"zh-Hans" is new to me: neither apache nor mozilla seems to know it.

So I'm puzzled: is there something I can rely on? Should I use "cn" all
the way or not?

Please point me where to find related standard.


I have a few webpages written in Chinese simplified and had to tackle
this issue. One problem I had to overcome is to avoid triggering a font
download modal window for Western users (when they have not any fonts
for Chinese).
This was my test page for NS 7 users:
http://www10.brinkster.com/doctorunc...logWindow.html
The language is defined by lang="zh"; the cn part only refer to the
country (as a sub-code). I think the most important issue, not mentioned
at all in your post, is the character encoding. I was told to use only
gb2312 to support simplified Chinese. And if you look around,

"GB is abbreviated from GuoBiao, which in turn is abbreviated from
GuoJia BiaoZhun, meaning "national standard." Most references to GB mean
the GB 2312-80 character set standard, established by mainland China in
1981 to represent simplified Chinese characters."
Chinese and Information Technology
Introduction to Chinese Information Processing
Chinese Character Set Standards
http://www.ldc.upenn.edu/Projects/Chinese/info_it.htm

"Big5 encodes traditional characters and is used in Hong Kong and
Taiwan, while GB encodes simplified characters and is used in mainland
China and Singapore."
Creating Chinese Web Pages
http://users.erols.com/eepeter/chine...ernet/web.html

you'll see that this is mighty important to do.

You can check my pages - written for simplified Chinese .

http://www10.brinkster.com/doctorunc...scriptCSS.html

http://www10.brinkster.com/doctorunc...ationPage.html
(old page: please don't use the flag rollover: that's consider bad... at
least the ones without "Chinese" and the GB2312)

Please, please, let me know if you have a problem with my Chinese pages. :)

DU
Jul 20 '05 #4

P: n/a
DU
Zhang Weiwu wrote:

[snipped]

Chinese LANGUAGE TAGS from IANA:
(last updated 2003-10-09)

zh-Hans Chinese, in simplified script [Davis]

zh-Hant Chinese, in traditional [Davis]
script

zh-gan Kan or Gan [Compton]

zh-guoyu Mandarin or Standard Chinese [Compton]

zh-hakka Hakka [Compton, Cowan]

zh-min Min, Fuzhou, Hokkien, Amoy [Compton]
or Taiwanese

zh-min-nan Minnan, Hokkien, Amoy, [Tai]
Taiwanese, Southern Min,
Southern Fujian, Hoklo,
Southern Fukien, Ho-lo

zh-wuu Shanghaiese or Wu [Compton]

zh-xiang Xiang or Hunanese [Compton]

zh-yue Cantonese [Compton]

Taken from
http://www.iana.org/assignments/language-tags

but as mentioned before, you still *_need to declare the proper
character encoding_* for the whole page. Declaring the lang attribute
alone might not be enough.

DU
Jul 20 '05 #5

P: n/a
DU
Zhang Weiwu wrote:
Hello. I am working with a php software project, in it
(www.egroupware.org) Chinese simplified locate is "zh" while Traditional
Chinese "tw".

I wish to send correct language attribute in http header, I found "zh"
is not standard. I found this line in apache2's default httpd.conf

# Simplified Chinese (zh-CN)
AddLanguage zh-CN .zh-cn

So it seems zh-CN is the correct language attribute to send. But mozilla
seems to refuse it:

body:lang(zh-CN) { font-size: 14pt}

Be careful. Right now, very few browsers support the lang selector.
According to
http://www.westciv.com/style_master/...selectors.html
only MSIE 5 for Mac supports the lang selector. Not sure but I believe
(and I could be wrong) the latest Mozilla 1.6+ supports the lang
selector and also Opera 7.50 PR1.
This line always make the body font 14pt even if lang in http header is
set to "en". But

body:lang(cn) { font-size: 14pt}
This is an error, so it will be ignored. "cn" is not a valid language
but rather a country sub-code.
cn is not listed anywhere in the iso 639-2 spec:
http://www.loc.gov/standards/iso639-2/langcodes.html#cd


is recognized by mozilla. This line only make the body font 14pt when
the header is lang=cn.
This is due to/results of rules in case of parsing errors:
4.2 Rules for handling parsing errors
http://www.w3.org/TR/CSS2/syndata.html#parsing-errors


Check what are *your* default font size in Mozilla here for these languages:

Edit/Preferences.../Appearance/Fonts/
(drop down list) Fonts for: Western/
and then see the font-size for Proportional and Minimum font size
Repeat the same steps for
(drop down list) Fonts for: Simplified Chinese/

DU
So apache suggest "zh-CN", mozilla use "cn". I happen to know a few
others, say "zh_CN" "zh_EUC" and "zh-EUC". Today I read w3c's suggestion:
http://www.w3.org/International/ques...-css-lang.html

there is a:
<li xml:lang="zh-Hans" lang="zh-Hans">»ΆΣ*</li>

"zh-Hans" is new to me: neither apache nor mozilla seems to know it.

So I'm puzzled: is there something I can rely on? Should I use "cn" all
the way or not?

Please point me where to find related standard.

Jul 20 '05 #6

P: n/a
DU
DU wrote:
Zhang Weiwu wrote:


[snipped]

But
mozilla seems to refuse it:

body:lang(zh-CN) { font-size: 14pt}


Be careful. Right now, very few browsers support the lang selector.
According to
http://www.westciv.com/style_master/...selectors.html
only MSIE 5 for Mac supports the lang selector. Not sure but I believe
(and I could be wrong) the latest Mozilla 1.6+ supports the lang
selector and also Opera 7.50 PR1.


If you really need to specify a font-size for some Chinese-lang-defined
elements, then use the attribute selector instead of the :lang
pseudo-class: :lang as a pseudo-class is not much supported, even among
recent browsers.

E.g.:
p[lang="zh"] {font-size:120%;}
will be much more supported than
p:lang(zh) {font-size:120%;}

This testpage on lang as a selector (not as a pseudo-class)

http://www.editions-eyrolles.com/css...s/select10.htm

works in Mozilla 1.7a and in Opera 7.50 PR1

DU
Jul 20 '05 #7

This discussion thread is closed

Replies have been disabled for this discussion.