More than one language in a page 
October 21st, 2008, 10:45 PM
| | |
What is the correct way to mark up, say, a div or p, to indicate
that it is in a different language to the main page? Are there
any potential pitfalls with different browsers associated with
doing this? If it makes any difference, this is in reference to
a page in mixed English and French: http://www.chem.utoronto.ca/IChO.Ontario/index.html | 
October 21st, 2008, 11:15 PM
| | | | re: More than one language in a page
On 2008-10-21, David Stone <no.email@domain.invalidwrote: Quote:
What is the correct way to mark up, say, a div or p, to indicate
that it is in a different language to the main page?
| You just do <div lang="en"etc. What browsers actually do with that
lang attribute is not so clear. In most cases probably nothing, although
it may influence choice of font in some.
Most fonts I've seen that are used for English will also contain all the
glyphs needed for French anyway.
I don't know if aural renderers use it to influence choice of speech
synthesizer. I doubt it, but you never know. | 
October 22nd, 2008, 09:15 AM
| | | | re: More than one language in a page
Ben C wrote: Quote:
You just do <div lang="en"etc. What browsers actually do with that
lang attribute is not so clear. In most cases probably nothing, although
it may influence choice of font in some.
| Well, some languages display right-to left so the difference there
should be significant. I believe that there are also spacing issues
around some punctuation, and word-splitting issues as well. Of course,
it's all down to the care with which the browser was coded.
--
Steve Swift http://www.swiftys.org.uk/swifty.html http://www.ringers.org.uk | 
October 22nd, 2008, 09:35 AM
| | | | re: More than one language in a page
On 2008-10-22, Swifty <steve.j.swift@gmail.comwrote: Quote:
Ben C wrote: Quote:
>You just do <div lang="en"etc. What browsers actually do with that
>lang attribute is not so clear. In most cases probably nothing, although
>it may influence choice of font in some.
| >
Well, some languages display right-to left so the difference there
should be significant.
| For that you've got to use dir=rtl or "direction: rtl". lang=ar by
itself won't make any difference. Quote:
I believe that there are also spacing issues around some punctuation,
and word-splitting issues as well. Of course, it's all down to the
care with which the browser was coded.
| I haven't seen lang making a difference, but perhaps it should. Some
browsers use something based on Unicode Annex 14 for line-breaking, and
language is not involved in the algorithm they describe there.
See also http://www.cs.tut.fi/~jkorpela/unicode/linebr.html | 
October 22nd, 2008, 09:55 AM
| | | | re: More than one language in a page
On Wed, 22 Oct 2008, Ben C wrote: Quote:
For that you've got to use dir=rtl or "direction: rtl". lang=ar by
itself won't make any difference.
| But the use of Arabic script should make a difference without the need of
specifying the writing direction. Quote: Quote:
I believe that there are also spacing issues around some punctuation,
and word-splitting issues as well. Of course, it's all down to the
care with which the browser was coded.
| | One example could be the interpretation of a quote symbol like <q>:
<p lang="en">The word <q><span lang="fr">chef</span></qis of French origin.</p>
should be rendered as
The word ``chef´´ is of French origin.
whereas the (incorrect)
<p lang="en">The word <span lang="fr"><q>chef</q></spanis of French origin.</p>
as
The word « chef » is of French origin.
--
Helmut Richter | 
October 22nd, 2008, 10:35 AM
| | | | re: More than one language in a page
Helmut Richter schreef: Quote:
On Wed, 22 Oct 2008, Ben C wrote:
> Quote:
>For that you've got to use dir=rtl or "direction: rtl". lang=ar by
>itself won't make any difference.
| >
But the use of Arabic script should make a difference without the need of
specifying the writing direction.
> Quote: Quote:
>>I believe that there are also spacing issues around some punctuation,
>>and word-splitting issues as well. Of course, it's all down to the
>>care with which the browser was coded.
| | >
One example could be the interpretation of a quote symbol like <q>:
>
<p lang="en">The word <q><span lang="fr">chef</span></qis of French origin.</p>
>
should be rendered as
>
The word ``chef´´ is of French origin.
| You mean
The word “chef” is of French origin.
:-p
H.
--
Hendrik Maryns http://tcl.sfs.uni-tuebingen.de/~hendrik/
==================
Ask smart questions, get good answers: http://www.catb.org/~esr/faqs/smart-questions.html | 
October 22nd, 2008, 11:15 AM
| | | | re: More than one language in a page
On 2008-10-22, Helmut Richter <hhr-m@web.dewrote: Quote:
On Wed, 22 Oct 2008, Ben C wrote:
> Quote:
>For that you've got to use dir=rtl or "direction: rtl". lang=ar by
>itself won't make any difference.
| >
But the use of Arabic script should make a difference without the need of
specifying the writing direction.
| It will make a difference but it won't be quite right in all
circumstances.
For a simple Arabic string it will be OK (although left-aligned), but if
you've got Roman characters embedded in there, the bidi base direction
will be wrong.
You can see the result of bidi base direction if you try an example like
this:
<div dir="rtl">
ARABIC hello
</div>
which should appear as "hello CIBARA"
<div dir="ltr">
ARABIC hello
</div>
which should appear as "CIBARA hello".
I'm using capitals to mean strongly right-to-left characters-- of course
you'd need real Arabic in the example for it to work.
Unicode Annex 9 defines three bidi base directions: left-to-right,
right-to-left and neutral.
In HTML and CSS specifications you get left-to-right unless you specify
dir or direction respectively to get right-to-left. You can't have
neutral (except perhaps in a textarea or input). Quote: Quote: Quote:
I believe that there are also spacing issues around some punctuation,
and word-splitting issues as well. Of course, it's all down to the
care with which the browser was coded.
| | >
One example could be the interpretation of a quote symbol like <q>:
>
><p lang="en">The word <q><span lang="fr">chef</span></qis of French origin.</p>
>
should be rendered as
>
The word ``chef´´ is of French origin.
>
whereas the (incorrect)
>
><p lang="en">The word <span lang="fr"><q>chef</q></spanis of French origin.</p>
>
as
>
The word « chef » is of French origin.
| Yes and there is stuff in CSS to do all that-- see the "quotes"
property, content: open-quote, and lang pseudos in CSS 2.1.
Not sure if any of the browsers actually implement all that stuff
though.
I think Korpela recommends just type the quote characters you want and
don't bother with <qbut I hope I'm not misquoting him [pause for
groans]. | 
October 22nd, 2008, 11:25 AM
| | | | re: More than one language in a page
On Wed, 22 Oct 2008, Stefan Ram wrote: Quote:
(However, »chef« as used above actually is the English
word (because it is said that it was of french origin),
and so it should not be marked as french.
>
The English word »chef« is of french origin.
>
The French word »chef« is not of french origin, it /is/ french.)
| Right. I should have taken better example.
--
Helmut Richter | 
October 22nd, 2008, 11:55 AM
| | | | re: More than one language in a page
On Wed, 22 Oct 2008, Hendrik Maryns wrote: Quote: Quote:
One example could be the interpretation of a quote symbol like <q>:
<p lang="en">The word <q><span lang="fr">chef</span></qis of French origin.</p>
should be rendered as
The word ``chef´´ is of French origin.
| >
You mean
>
The word «chef» is of French origin.
| No.I meant what I wrote.
1) When the quotes are in the outer text, they are English. These are also the
correct quotes (at least according to German quote rules where the *outer*
language determines the form of the quotes at least as long as the quoted
text is not a paragraph of its own).
2) Guillemets are used with a space to the enclosed text:
« chef »
In German, they are sometimes used the other way round without spaces
instead of other quotes:
»chef«
--
Helmut Richter | 
October 22nd, 2008, 02:25 PM
| | | | re: More than one language in a page
Helmut Richter schreef: Quote:
On Wed, 22 Oct 2008, Hendrik Maryns wrote:
>> Quote: Quote:
>>One example could be the interpretation of a quote symbol like <q>:
>>>
>><p lang="en">The word <q><span lang="fr">chef</span></qis of French origin.</p>
>>>
>>should be rendered as
>>>
>> The word ``chef´´ is of French origin.
| >You mean
>>
> The word «chef» is of French origin.
| >
No.I meant what I wrote.
| This is interesting. I did not type «» (i.e. guillemets) at all. I
actually typed “” (i.e. proper curly open and close quotes); it seems
like your newsreader has interpreted them as guillemets anyway. Funny.
I suppose you (or me?) have an encoding problem.
H.
--
Hendrik Maryns http://tcl.sfs.uni-tuebingen.de/~hendrik/
==================
Ask smart questions, get good answers: http://www.catb.org/~esr/faqs/smart-questions.html | 
October 22nd, 2008, 02:35 PM
| | | | re: More than one language in a page
On Wed, 22 Oct 2008, Hendrik Maryns wrote: Quote:
This is interesting. I did not type «» (i.e. guillemets) at all. I
actually typed «» (i.e. proper curly open and close quotes); it seems
like your newsreader has interpreted them as guillemets anyway. Funny.
I suppose you (or me?) have an encoding problem.
| It is me who has an encoding problemą). Had it resulted in illegible
characters (?chef?), I would have checked. But as it looked like a
possibly intended usage of guillemets I did not check. I am sorry for the
oversight.
ą) The newsreader correctly converts UTF-8 to ISO-8859-1 if the character
exists there. Other characters are converted to something the newsreader
considers appropriate. I was not aware that English quotes are converted
to guillemets: it is much too seldom that I receive text with English
quotes.
--
Helmut Richter | 
October 22nd, 2008, 07:55 PM
| | | | re: More than one language in a page
Ben C wrote: Quote: |
You just do <div lang="en"etc.
| That's a right thing to do, though in practical terms, it does not matter
much. Quote:
What browsers actually do with that
lang attribute is not so clear. In most cases probably nothing,
although it may influence choice of font in some.
| Mostly for East Asian languages, and only when the page does not set font -
and most pages do, no matter what we think about that. Quote:
Most fonts I've seen that are used for English will also contain all
the glyphs needed for French anyway.
| Well, yes, and I would expect any browser default font to contain all French
characters, anyway. Quote:
I don't know if aural renderers use it to influence choice of speech
synthesizer.
| Some of them use, at least optionally. But in fact, considering the web as a
whole, good algorithmic language guessing (from the content) generally
produces better results. There are so many non-English pages incorrectly
marked up as English, due to misunderstandings or, most often, due to web
authoring software defaults.
--
Yucca, http://www.cs.tut.fi/~jkorpela/ | 
October 22nd, 2008, 09:15 PM
| | | | re: More than one language in a page
On Wed, 22 Oct 2008, Jukka K. Korpela wrote: Quote:
Some of them use, at least optionally. But in fact, considering the web as a
whole, good algorithmic language guessing (from the content) generally
produces better results.
| And in most contexts there are not many languages to choose from. I am now
transferring data to another CMS, and I use the simple algorithm "if there
are twice as many "the" (as single words) than "der", the language is
"en", otherwise "de". It is *much* more reliable than trusting the
language explicitly specified by the authors in the old CMS.
--
Helmut Richter | 
October 23rd, 2008, 01:15 PM
| | | | re: More than one language in a page
In article
<Pine.LNX.4.64.0810222206530.4962@lxhri01.lrz.lr z-muenchen.de>,
Helmut Richter <hhr-m@web.dewrote: Quote:
On Wed, 22 Oct 2008, Jukka K. Korpela wrote:
> Quote:
Some of them use, at least optionally. But in fact, considering the web as a
whole, good algorithmic language guessing (from the content) generally
produces better results.
| >
And in most contexts there are not many languages to choose from. I am now
transferring data to another CMS, and I use the simple algorithm "if there
are twice as many "the" (as single words) than "der", the language is
"en", otherwise "de". It is *much* more reliable than trusting the
language explicitly specified by the authors in the old CMS.
| So what everyone seems to be saying is that there isn't much practical
point in specifying page language, except (i) if it requires a particular
character set (which is specified separately), (ii) if it differs from
left-to-right direction (which is specified separately), and/or (iii) to
be nice? | 
October 23rd, 2008, 03:45 PM
| | | | re: More than one language in a page
On Thu, 23 Oct 2008, I wrote: "some experiments" of course <sigh> | 
October 23rd, 2008, 04:15 PM
| | | | re: More than one language in a page
On Wed, 22 Oct 2008, Stefan Ram wrote: Very good! Thank you for the hint.
Even <TBODY class="notranslate" is recognized.
Some browsers like Internet Explorer tend to ignore TBODY.
--
In memoriam Alan J. Flavell http://www.alanflavell.org.uk/charset/ | 
October 23rd, 2008, 04:25 PM
| | | | re: More than one language in a page
On 2008-10-23, David Stone <no.email@domain.invalidwrote: Quote:
In article
><Pine.LNX.4.64.0810222206530.4962@lxhri01.lrz.l rz-muenchen.de>,
Helmut Richter <hhr-m@web.dewrote:
> Quote:
>On Wed, 22 Oct 2008, Jukka K. Korpela wrote:
>> Quote:
Some of them use, at least optionally. But in fact, considering the web as a
whole, good algorithmic language guessing (from the content) generally
produces better results.
| >>
>And in most contexts there are not many languages to choose from. I am now
>transferring data to another CMS, and I use the simple algorithm "if there
>are twice as many "the" (as single words) than "der", the language is
>"en", otherwise "de". It is *much* more reliable than trusting the
>language explicitly specified by the authors in the old CMS.
| >
So what everyone seems to be saying is that there isn't much practical
point in specifying page language, except (i) if it requires a particular
character set (which is specified separately),
| More if it requires a particular font (which is usually set explicitly
or detected separately). Quote:
(ii) if it differs from
left-to-right direction (which is specified separately), and/or (iii) to
be nice?
| | 
October 23rd, 2008, 06:15 PM
| | | | re: More than one language in a page
In article
<Pine.GSO.4.63.0810231619270.1336@s5b004.rrzn.un i-hannover.de>,
Andreas Prilop <prilop4321@trashmail.netwrote: I've honestly never considered doing so, because I've always
avoided using characters in usenet posts that aren't in the
basic ASCII set. I'd just use "euros" and "cents" (or the
ubiquitous "c") instead.
However, I did find a "Send with MIME" option in the preferences,
so I checked it. Don't know if it will affect this reply, though.
I don't think I've ever needed to do a bilingual post (largely
because I am monolingual); the reason for this particular thread
is because I am currently responsible for a web site that has to
be in English and French. Parlais Frainglais, anyone? | 
October 24th, 2008, 04:55 PM
| | | | re: More than one language in a page
On Thu, 23 Oct 2008, I wrote: Quote: >
Very good! Thank you for the hint.
Even <TBODY class="notranslate" is recognized.
| A further note:
Google translates <TT but it does not translate <CODE-
even without any class=notranslate .
This is another point for semantic markup with CODE
instead of just TT.
--
In memoriam Alan J. Flavell http://www.alanflavell.org.uk/charset/ | 
October 25th, 2008, 07:05 AM
| | | | re: More than one language in a page
Andreas Prilop wrote: Quote:
Google translates <TT but it does not translate <CODE-
even without any class=notranslate .
This is another point for semantic markup with CODE
instead of just TT.
| There's a logical gap here, though. Computer code may well contain comments,
which are (in theory at least) supposed to be in some human language and
understandable to speakers of that language. If <CODEimplies
non-translation, then there is no way, even with explicit markup, to specify
that comments be translated.
The page http://www.google.com/intl/en/help/faq_translation.html describes
class=notranslate but no attribute for turning translation on (inside an
element that is treated as nontranslatable). Looks like command-oriented tag
design, which even forgot to provide a way to give the opposite command.
--
Yucca, http://www.cs.tut.fi/~jkorpela/ | 
November 10th, 2008, 03:55 PM
| | | | re: More than one language in a page
On Sat, 25 Oct 2008, Jukka K. Korpela wrote: Another observation: When I have
<table dir="ltr" lang="fr" class="notranslate">
Google will still mess around with it. On translating the page
from English to Arabic or Hebrew, Google changes the direction
of the table to right-to-left and the table is f*cked up.
--
In memoriam Alan J. Flavell http://www.alanflavell.org.uk/charset/ |  | | | | /bytes/about
We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights.
Get the best answers to your questions from over 225,662 network members.
|