Connecting Tech Pros Worldwide Forums | Help | Site Map

More than one language in a page

David Stone
Guest
 
Posts: n/a
#1: Oct 21 '08
What is the correct way to mark up, say, a div or p, to indicate
that it is in a different language to the main page? Are there
any potential pitfalls with different browsers associated with
doing this? If it makes any difference, this is in reference to
a page in mixed English and French:
http://www.chem.utoronto.ca/IChO.Ontario/index.html

Ben C
Guest
 
Posts: n/a
#2: Oct 21 '08

re: More than one language in a page


On 2008-10-21, David Stone <no.email@domain.invalidwrote:
Quote:
What is the correct way to mark up, say, a div or p, to indicate
that it is in a different language to the main page?
You just do <div lang="en"etc. What browsers actually do with that
lang attribute is not so clear. In most cases probably nothing, although
it may influence choice of font in some.

Most fonts I've seen that are used for English will also contain all the
glyphs needed for French anyway.

I don't know if aural renderers use it to influence choice of speech
synthesizer. I doubt it, but you never know.
Quote:
Are there any potential pitfalls with different browsers associated
with doing this? If it makes any difference, this is in reference to a
page in mixed English and French:
http://www.chem.utoronto.ca/IChO.Ontario/index.html
Swifty
Guest
 
Posts: n/a
#3: Oct 22 '08

re: More than one language in a page


Ben C wrote:
Quote:
You just do <div lang="en"etc. What browsers actually do with that
lang attribute is not so clear. In most cases probably nothing, although
it may influence choice of font in some.
Well, some languages display right-to left so the difference there
should be significant. I believe that there are also spacing issues
around some punctuation, and word-splitting issues as well. Of course,
it's all down to the care with which the browser was coded.

--
Steve Swift
http://www.swiftys.org.uk/swifty.html
http://www.ringers.org.uk
Ben C
Guest
 
Posts: n/a
#4: Oct 22 '08

re: More than one language in a page


On 2008-10-22, Swifty <steve.j.swift@gmail.comwrote:
Quote:
Ben C wrote:
Quote:
>You just do <div lang="en"etc. What browsers actually do with that
>lang attribute is not so clear. In most cases probably nothing, although
>it may influence choice of font in some.
>
Well, some languages display right-to left so the difference there
should be significant.
For that you've got to use dir=rtl or "direction: rtl". lang=ar by
itself won't make any difference.
Quote:
I believe that there are also spacing issues around some punctuation,
and word-splitting issues as well. Of course, it's all down to the
care with which the browser was coded.
I haven't seen lang making a difference, but perhaps it should. Some
browsers use something based on Unicode Annex 14 for line-breaking, and
language is not involved in the algorithm they describe there.

See also http://www.cs.tut.fi/~jkorpela/unicode/linebr.html
Helmut Richter
Guest
 
Posts: n/a
#5: Oct 22 '08

re: More than one language in a page


On Wed, 22 Oct 2008, Ben C wrote:
Quote:
For that you've got to use dir=rtl or "direction: rtl". lang=ar by
itself won't make any difference.
But the use of Arabic script should make a difference without the need of
specifying the writing direction.
Quote:
Quote:
I believe that there are also spacing issues around some punctuation,
and word-splitting issues as well. Of course, it's all down to the
care with which the browser was coded.
One example could be the interpretation of a quote symbol like <q>:

<p lang="en">The word <q><span lang="fr">chef</span></qis of French origin.</p>

should be rendered as

The word ``chef´´ is of French origin.

whereas the (incorrect)

<p lang="en">The word <span lang="fr"><q>chef</q></spanis of French origin.</p>

as

The word « chef » is of French origin.

--
Helmut Richter
Hendrik Maryns
Guest
 
Posts: n/a
#6: Oct 22 '08

re: More than one language in a page


Helmut Richter schreef:
Quote:
On Wed, 22 Oct 2008, Ben C wrote:
>
Quote:
>For that you've got to use dir=rtl or "direction: rtl". lang=ar by
>itself won't make any difference.
>
But the use of Arabic script should make a difference without the need of
specifying the writing direction.
>
Quote:
Quote:
>>I believe that there are also spacing issues around some punctuation,
>>and word-splitting issues as well. Of course, it's all down to the
>>care with which the browser was coded.
>
One example could be the interpretation of a quote symbol like <q>:
>
<p lang="en">The word <q><span lang="fr">chef</span></qis of French origin.</p>
>
should be rendered as
>
The word ``chef´´ is of French origin.
You mean

The word “chef” is of French origin.

:-p
H.
--
Hendrik Maryns
http://tcl.sfs.uni-tuebingen.de/~hendrik/
==================
Ask smart questions, get good answers:
http://www.catb.org/~esr/faqs/smart-questions.html
Ben C
Guest
 
Posts: n/a
#7: Oct 22 '08

re: More than one language in a page


On 2008-10-22, Helmut Richter <hhr-m@web.dewrote:
Quote:
On Wed, 22 Oct 2008, Ben C wrote:
>
Quote:
>For that you've got to use dir=rtl or "direction: rtl". lang=ar by
>itself won't make any difference.
>
But the use of Arabic script should make a difference without the need of
specifying the writing direction.
It will make a difference but it won't be quite right in all
circumstances.

For a simple Arabic string it will be OK (although left-aligned), but if
you've got Roman characters embedded in there, the bidi base direction
will be wrong.

You can see the result of bidi base direction if you try an example like
this:

<div dir="rtl">
ARABIC hello
</div>

which should appear as "hello CIBARA"

<div dir="ltr">
ARABIC hello
</div>

which should appear as "CIBARA hello".

I'm using capitals to mean strongly right-to-left characters-- of course
you'd need real Arabic in the example for it to work.

Unicode Annex 9 defines three bidi base directions: left-to-right,
right-to-left and neutral.

In HTML and CSS specifications you get left-to-right unless you specify
dir or direction respectively to get right-to-left. You can't have
neutral (except perhaps in a textarea or input).
Quote:
Quote:
Quote:
I believe that there are also spacing issues around some punctuation,
and word-splitting issues as well. Of course, it's all down to the
care with which the browser was coded.
>
One example could be the interpretation of a quote symbol like <q>:
>
><p lang="en">The word <q><span lang="fr">chef</span></qis of French origin.</p>
>
should be rendered as
>
The word ``chef´´ is of French origin.
>
whereas the (incorrect)
>
><p lang="en">The word <span lang="fr"><q>chef</q></spanis of French origin.</p>
>
as
>
The word « chef » is of French origin.
Yes and there is stuff in CSS to do all that-- see the "quotes"
property, content: open-quote, and lang pseudos in CSS 2.1.

Not sure if any of the browsers actually implement all that stuff
though.

I think Korpela recommends just type the quote characters you want and
don't bother with <qbut I hope I'm not misquoting him [pause for
groans].
Helmut Richter
Guest
 
Posts: n/a
#8: Oct 22 '08

re: More than one language in a page


On Wed, 22 Oct 2008, Stefan Ram wrote:
Quote:
(However, »chef« as used above actually is the English
word (because it is said that it was of french origin),
and so it should not be marked as french.
>
The English word »chef« is of french origin.
>
The French word »chef« is not of french origin, it /is/ french.)
Right. I should have taken better example.

--
Helmut Richter
Helmut Richter
Guest
 
Posts: n/a
#9: Oct 22 '08

re: More than one language in a page


On Wed, 22 Oct 2008, Hendrik Maryns wrote:
Quote:
Helmut Richter schreef:
Quote:
Quote:
One example could be the interpretation of a quote symbol like <q>:

<p lang="en">The word <q><span lang="fr">chef</span></qis of French origin.</p>

should be rendered as

The word ``chef´´ is of French origin.
>
You mean
>
The word «chef» is of French origin.
No.I meant what I wrote.

1) When the quotes are in the outer text, they are English. These are also the
correct quotes (at least according to German quote rules where the *outer*
language determines the form of the quotes at least as long as the quoted
text is not a paragraph of its own).

2) Guillemets are used with a space to the enclosed text:

« chef »

In German, they are sometimes used the other way round without spaces
instead of other quotes:

»chef«

--
Helmut Richter
Hendrik Maryns
Guest
 
Posts: n/a
#10: Oct 22 '08

re: More than one language in a page


Helmut Richter schreef:
Quote:
On Wed, 22 Oct 2008, Hendrik Maryns wrote:
>
Quote:
>Helmut Richter schreef:
>
Quote:
Quote:
>>One example could be the interpretation of a quote symbol like <q>:
>>>
>><p lang="en">The word <q><span lang="fr">chef</span></qis of French origin.</p>
>>>
>>should be rendered as
>>>
>> The word ``chef´´ is of French origin.
>You mean
>>
> The word «chef» is of French origin.
>
No.I meant what I wrote.
This is interesting. I did not type «» (i.e. guillemets) at all. I
actually typed “” (i.e. proper curly open and close quotes); it seems
like your newsreader has interpreted them as guillemets anyway. Funny.
I suppose you (or me?) have an encoding problem.

H.
--
Hendrik Maryns
http://tcl.sfs.uni-tuebingen.de/~hendrik/
==================
Ask smart questions, get good answers:
http://www.catb.org/~esr/faqs/smart-questions.html
Helmut Richter
Guest
 
Posts: n/a
#11: Oct 22 '08

re: More than one language in a page


On Wed, 22 Oct 2008, Hendrik Maryns wrote:
Quote:
This is interesting. I did not type «» (i.e. guillemets) at all. I
actually typed «» (i.e. proper curly open and close quotes); it seems
like your newsreader has interpreted them as guillemets anyway. Funny.
I suppose you (or me?) have an encoding problem.
It is me who has an encoding problemą). Had it resulted in illegible
characters (?chef?), I would have checked. But as it looked like a
possibly intended usage of guillemets I did not check. I am sorry for the
oversight.

ą) The newsreader correctly converts UTF-8 to ISO-8859-1 if the character
exists there. Other characters are converted to something the newsreader
considers appropriate. I was not aware that English quotes are converted
to guillemets: it is much too seldom that I receive text with English
quotes.

--
Helmut Richter
Jukka K. Korpela
Guest
 
Posts: n/a
#12: Oct 22 '08

re: More than one language in a page


Ben C wrote:
Quote:
You just do <div lang="en"etc.
That's a right thing to do, though in practical terms, it does not matter
much.
Quote:
What browsers actually do with that
lang attribute is not so clear. In most cases probably nothing,
although it may influence choice of font in some.
Mostly for East Asian languages, and only when the page does not set font -
and most pages do, no matter what we think about that.
Quote:
Most fonts I've seen that are used for English will also contain all
the glyphs needed for French anyway.
Well, yes, and I would expect any browser default font to contain all French
characters, anyway.
Quote:
I don't know if aural renderers use it to influence choice of speech
synthesizer.
Some of them use, at least optionally. But in fact, considering the web as a
whole, good algorithmic language guessing (from the content) generally
produces better results. There are so many non-English pages incorrectly
marked up as English, due to misunderstandings or, most often, due to web
authoring software defaults.

--
Yucca, http://www.cs.tut.fi/~jkorpela/

Helmut Richter
Guest
 
Posts: n/a
#13: Oct 22 '08

re: More than one language in a page


On Wed, 22 Oct 2008, Jukka K. Korpela wrote:
Quote:
Some of them use, at least optionally. But in fact, considering the web as a
whole, good algorithmic language guessing (from the content) generally
produces better results.
And in most contexts there are not many languages to choose from. I am now
transferring data to another CMS, and I use the simple algorithm "if there
are twice as many "the" (as single words) than "der", the language is
"en", otherwise "de". It is *much* more reliable than trusting the
language explicitly specified by the authors in the old CMS.

--
Helmut Richter
David Stone
Guest
 
Posts: n/a
#14: Oct 23 '08

re: More than one language in a page


In article
<Pine.LNX.4.64.0810222206530.4962@lxhri01.lrz.lr z-muenchen.de>,
Helmut Richter <hhr-m@web.dewrote:
Quote:
On Wed, 22 Oct 2008, Jukka K. Korpela wrote:
>
Quote:
Some of them use, at least optionally. But in fact, considering the web as a
whole, good algorithmic language guessing (from the content) generally
produces better results.
>
And in most contexts there are not many languages to choose from. I am now
transferring data to another CMS, and I use the simple algorithm "if there
are twice as many "the" (as single words) than "der", the language is
"en", otherwise "de". It is *much* more reliable than trusting the
language explicitly specified by the authors in the old CMS.
So what everyone seems to be saying is that there isn't much practical
point in specifying page language, except (i) if it requires a particular
character set (which is specified separately), (ii) if it differs from
left-to-right direction (which is specified separately), and/or (iii) to
be nice?
Andreas Prilop
Guest
 
Posts: n/a
#15: Oct 23 '08

re: More than one language in a page


On Thu, 23 Oct 2008, David Stone wrote:
Quote:
User-Agent: MT-NewsWatcher/3.5.2 (PPC Mac OS X)
When you write about such a subject, you should at least set up
your newsreader properly so that it can post and display non-English
characters correctly:
http://www.smfr.org/mtnw/docs/Mime.h...sage_with_MIME
http://www.smfr.org/mtnw/docs/TextEncoding.html

Euro sign: €
Cent sign: ˘
Quote:
So what everyone seems to be saying is that there isn't much practical
point in specifying page language,
Not for English and French - but for other languages. Have a look
at http://www.unics.uni-hannover.de/nht...-attribute.htm
with Firefox and do some experience with the font settings in
your browser.

--
In memoriam Alan J. Flavell
http://www.alanflavell.org.uk/charset/
Andreas Prilop
Guest
 
Posts: n/a
#16: Oct 23 '08

re: More than one language in a page


On Thu, 23 Oct 2008, I wrote:
Quote:
Not for English and French - but for other languages. Have a look
at http://www.unics.uni-hannover.de/nht...-attribute.htm
with Firefox and do some experience with the font settings in
"some experiments" of course <sigh>
Andreas Prilop
Guest
 
Posts: n/a
#17: Oct 23 '08

re: More than one language in a page


On Wed, 22 Oct 2008, Stefan Ram wrote:
Quote:
In this case, one might even use Google's new attribute value:
>
<q><span lang="fr" class="notranslate">chef</span></q>
>
http://googlewebmastercentral.blogsp...e-barrier.html
Very good! Thank you for the hint.
Even <TBODY class="notranslate" is recognized.
Some browsers like Internet Explorer tend to ignore TBODY.

--
In memoriam Alan J. Flavell
http://www.alanflavell.org.uk/charset/
Ben C
Guest
 
Posts: n/a
#18: Oct 23 '08

re: More than one language in a page


On 2008-10-23, David Stone <no.email@domain.invalidwrote:
Quote:
In article
><Pine.LNX.4.64.0810222206530.4962@lxhri01.lrz.l rz-muenchen.de>,
Helmut Richter <hhr-m@web.dewrote:
>
Quote:
>On Wed, 22 Oct 2008, Jukka K. Korpela wrote:
>>
Quote:
Some of them use, at least optionally. But in fact, considering the web as a
whole, good algorithmic language guessing (from the content) generally
produces better results.
>>
>And in most contexts there are not many languages to choose from. I am now
>transferring data to another CMS, and I use the simple algorithm "if there
>are twice as many "the" (as single words) than "der", the language is
>"en", otherwise "de". It is *much* more reliable than trusting the
>language explicitly specified by the authors in the old CMS.
>
So what everyone seems to be saying is that there isn't much practical
point in specifying page language, except (i) if it requires a particular
character set (which is specified separately),
More if it requires a particular font (which is usually set explicitly
or detected separately).
Quote:
(ii) if it differs from
left-to-right direction (which is specified separately), and/or (iii) to
be nice?
David Stone
Guest
 
Posts: n/a
#19: Oct 23 '08

re: More than one language in a page


In article
<Pine.GSO.4.63.0810231619270.1336@s5b004.rrzn.un i-hannover.de>,
Andreas Prilop <prilop4321@trashmail.netwrote:
Quote:
On Thu, 23 Oct 2008, David Stone wrote:
>
Quote:
User-Agent: MT-NewsWatcher/3.5.2 (PPC Mac OS X)
>
When you write about such a subject, you should at least set up
your newsreader properly so that it can post and display non-English
characters correctly:
http://www.smfr.org/mtnw/docs/Mime.h...sage_with_MIME
http://www.smfr.org/mtnw/docs/TextEncoding.html
>
Euro sign: ¤
Cent sign: ˘
I've honestly never considered doing so, because I've always
avoided using characters in usenet posts that aren't in the
basic ASCII set. I'd just use "euros" and "cents" (or the
ubiquitous "c") instead.

However, I did find a "Send with MIME" option in the preferences,
so I checked it. Don't know if it will affect this reply, though.

I don't think I've ever needed to do a bilingual post (largely
because I am monolingual); the reason for this particular thread
is because I am currently responsible for a web site that has to
be in English and French. Parlais Frainglais, anyone?
Andreas Prilop
Guest
 
Posts: n/a
#20: Oct 24 '08

re: More than one language in a page


On Thu, 23 Oct 2008, I wrote:
Quote:
>
Very good! Thank you for the hint.
Even <TBODY class="notranslate" is recognized.
A further note:
Google translates <TT but it does not translate <CODE-
even without any class=notranslate .
This is another point for semantic markup with CODE
instead of just TT.

--
In memoriam Alan J. Flavell
http://www.alanflavell.org.uk/charset/
Jukka K. Korpela
Guest
 
Posts: n/a
#21: Oct 25 '08

re: More than one language in a page


Andreas Prilop wrote:
Quote:
Google translates <TT but it does not translate <CODE-
even without any class=notranslate .
This is another point for semantic markup with CODE
instead of just TT.
There's a logical gap here, though. Computer code may well contain comments,
which are (in theory at least) supposed to be in some human language and
understandable to speakers of that language. If <CODEimplies
non-translation, then there is no way, even with explicit markup, to specify
that comments be translated.

The page http://www.google.com/intl/en/help/faq_translation.html describes
class=notranslate but no attribute for turning translation on (inside an
element that is treated as nontranslatable). Looks like command-oriented tag
design, which even forgot to provide a way to give the opposite command.

--
Yucca, http://www.cs.tut.fi/~jkorpela/

Andreas Prilop
Guest
 
Posts: n/a
#22: Nov 10 '08

re: More than one language in a page


On Sat, 25 Oct 2008, Jukka K. Korpela wrote:
Quote:
The page http://www.google.com/intl/en/help/faq_translation.html
describes class=notranslate
Another observation: When I have

<table dir="ltr" lang="fr" class="notranslate">

Google will still mess around with it. On translating the page
from English to Arabic or Hebrew, Google changes the direction
of the table to right-to-left and the table is f*cked up.

--
In memoriam Alan J. Flavell
http://www.alanflavell.org.uk/charset/
Closed Thread