473,511 Members | 16,738 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

More than one language in a page

What is the correct way to mark up, say, a div or p, to indicate
that it is in a different language to the main page? Are there
any potential pitfalls with different browsers associated with
doing this? If it makes any difference, this is in reference to
a page in mixed English and French:
http://www.chem.utoronto.ca/IChO.Ontario/index.html
Oct 21 '08 #1
21 1917
On 2008-10-21, David Stone <no******@domain.invalidwrote:
What is the correct way to mark up, say, a div or p, to indicate
that it is in a different language to the main page?
You just do <div lang="en"etc. What browsers actually do with that
lang attribute is not so clear. In most cases probably nothing, although
it may influence choice of font in some.

Most fonts I've seen that are used for English will also contain all the
glyphs needed for French anyway.

I don't know if aural renderers use it to influence choice of speech
synthesizer. I doubt it, but you never know.
Are there any potential pitfalls with different browsers associated
with doing this? If it makes any difference, this is in reference to a
page in mixed English and French:
http://www.chem.utoronto.ca/IChO.Ontario/index.html
Oct 21 '08 #2
Ben C wrote:
You just do <div lang="en"etc. What browsers actually do with that
lang attribute is not so clear. In most cases probably nothing, although
it may influence choice of font in some.
Well, some languages display right-to left so the difference there
should be significant. I believe that there are also spacing issues
around some punctuation, and word-splitting issues as well. Of course,
it's all down to the care with which the browser was coded.

--
Steve Swift
http://www.swiftys.org.uk/swifty.html
http://www.ringers.org.uk
Oct 22 '08 #3
On 2008-10-22, Swifty <st***********@gmail.comwrote:
Ben C wrote:
>You just do <div lang="en"etc. What browsers actually do with that
lang attribute is not so clear. In most cases probably nothing, although
it may influence choice of font in some.

Well, some languages display right-to left so the difference there
should be significant.
For that you've got to use dir=rtl or "direction: rtl". lang=ar by
itself won't make any difference.
I believe that there are also spacing issues around some punctuation,
and word-splitting issues as well. Of course, it's all down to the
care with which the browser was coded.
I haven't seen lang making a difference, but perhaps it should. Some
browsers use something based on Unicode Annex 14 for line-breaking, and
language is not involved in the algorithm they describe there.

See also http://www.cs.tut.fi/~jkorpela/unicode/linebr.html
Oct 22 '08 #4
On Wed, 22 Oct 2008, Ben C wrote:
For that you've got to use dir=rtl or "direction: rtl". lang=ar by
itself won't make any difference.
But the use of Arabic script should make a difference without the need of
specifying the writing direction.
I believe that there are also spacing issues around some punctuation,
and word-splitting issues as well. Of course, it's all down to the
care with which the browser was coded.
One example could be the interpretation of a quote symbol like <q>:

<p lang="en">The word <q><span lang="fr">chef</span></qis of French origin.</p>

should be rendered as

The word ``chef´´ is of French origin.

whereas the (incorrect)

<p lang="en">The word <span lang="fr"><q>chef</q></spanis of French origin.</p>

as

The word « chef » is of French origin.

--
Helmut Richter
Oct 22 '08 #5
Helmut Richter schreef:
On Wed, 22 Oct 2008, Ben C wrote:
>For that you've got to use dir=rtl or "direction: rtl". lang=ar by
itself won't make any difference.

But the use of Arabic script should make a difference without the need of
specifying the writing direction.
>>I believe that there are also spacing issues around some punctuation,
and word-splitting issues as well. Of course, it's all down to the
care with which the browser was coded.

One example could be the interpretation of a quote symbol like <q>:

<p lang="en">The word <q><span lang="fr">chef</span></qis of French origin.</p>

should be rendered as

The word ``chef´´ is of French origin.
You mean

The word “chef” is of French origin.

:-p
H.
--
Hendrik Maryns
http://tcl.sfs.uni-tuebingen.de/~hendrik/
==================
Ask smart questions, get good answers:
http://www.catb.org/~esr/faqs/smart-questions.html
Oct 22 '08 #6
On 2008-10-22, Helmut Richter <hh***@web.dewrote:
On Wed, 22 Oct 2008, Ben C wrote:
>For that you've got to use dir=rtl or "direction: rtl". lang=ar by
itself won't make any difference.

But the use of Arabic script should make a difference without the need of
specifying the writing direction.
It will make a difference but it won't be quite right in all
circumstances.

For a simple Arabic string it will be OK (although left-aligned), but if
you've got Roman characters embedded in there, the bidi base direction
will be wrong.

You can see the result of bidi base direction if you try an example like
this:

<div dir="rtl">
ARABIC hello
</div>

which should appear as "hello CIBARA"

<div dir="ltr">
ARABIC hello
</div>

which should appear as "CIBARA hello".

I'm using capitals to mean strongly right-to-left characters-- of course
you'd need real Arabic in the example for it to work.

Unicode Annex 9 defines three bidi base directions: left-to-right,
right-to-left and neutral.

In HTML and CSS specifications you get left-to-right unless you specify
dir or direction respectively to get right-to-left. You can't have
neutral (except perhaps in a textarea or input).
I believe that there are also spacing issues around some punctuation,
and word-splitting issues as well. Of course, it's all down to the
care with which the browser was coded.

One example could be the interpretation of a quote symbol like <q>:

<p lang="en">The word <q><span lang="fr">chef</span></qis of French origin.</p>

should be rendered as

The word ``chef´´ is of French origin.

whereas the (incorrect)

<p lang="en">The word <span lang="fr"><q>chef</q></spanis of French origin.</p>

as

The word « chef » is of French origin.
Yes and there is stuff in CSS to do all that-- see the "quotes"
property, content: open-quote, and lang pseudos in CSS 2.1.

Not sure if any of the browsers actually implement all that stuff
though.

I think Korpela recommends just type the quote characters you want and
don't bother with <qbut I hope I'm not misquoting him [pause for
groans].
Oct 22 '08 #7
On Wed, 22 Oct 2008, Stefan Ram wrote:
(However, »chef« as used above actually is the English
word (because it is said that it was of french origin),
and so it should not be marked as french.

The English word »chef« is of french origin.

The French word »chef« is not of french origin, it /is/ french.)
Right. I should have taken better example.

--
Helmut Richter
Oct 22 '08 #8
On Wed, 22 Oct 2008, Hendrik Maryns wrote:
Helmut Richter schreef:
One example could be the interpretation of a quote symbol like <q>:

<p lang="en">The word <q><span lang="fr">chef</span></qis of French origin.</p>

should be rendered as

The word ``chef´´ is of French origin.

You mean

The word «chef» is of French origin.
No.I meant what I wrote.

1) When the quotes are in the outer text, they are English. These are also the
correct quotes (at least according to German quote rules where the *outer*
language determines the form of the quotes at least as long as the quoted
text is not a paragraph of its own).

2) Guillemets are used with a space to the enclosed text:

« chef »

In German, they are sometimes used the other way round without spaces
instead of other quotes:

»chef«

--
Helmut Richter
Oct 22 '08 #9
Helmut Richter schreef:
On Wed, 22 Oct 2008, Hendrik Maryns wrote:
>Helmut Richter schreef:
>>One example could be the interpretation of a quote symbol like <q>:

<p lang="en">The word <q><span lang="fr">chef</span></qis of French origin.</p>

should be rendered as

The word ``chef´´ is of French origin.
You mean

The word «chef» is of French origin.

No.I meant what I wrote.
This is interesting. I did not type «» (i.e. guillemets) at all. I
actually typed “” (i.e. proper curly open and close quotes); it seems
like your newsreader has interpreted them as guillemets anyway. Funny.
I suppose you (or me?) have an encoding problem.

H.
--
Hendrik Maryns
http://tcl.sfs.uni-tuebingen.de/~hendrik/
==================
Ask smart questions, get good answers:
http://www.catb.org/~esr/faqs/smart-questions.html
Oct 22 '08 #10
On Wed, 22 Oct 2008, Hendrik Maryns wrote:
This is interesting. I did not type «» (i.e. guillemets) at all. I
actually typed «» (i.e. proper curly open and close quotes); it seems
like your newsreader has interpreted them as guillemets anyway. Funny.
I suppose you (or me?) have an encoding problem.
It is me who has an encoding problemą). Had it resulted in illegible
characters (?chef?), I would have checked. But as it looked like a
possibly intended usage of guillemets I did not check. I am sorry for the
oversight.

ą) The newsreader correctly converts UTF-8 to ISO-8859-1 if the character
exists there. Other characters are converted to something the newsreader
considers appropriate. I was not aware that English quotes are converted
to guillemets: it is much too seldom that I receive text with English
quotes.

--
Helmut Richter
Oct 22 '08 #11
Ben C wrote:
You just do <div lang="en"etc.
That's a right thing to do, though in practical terms, it does not matter
much.
What browsers actually do with that
lang attribute is not so clear. In most cases probably nothing,
although it may influence choice of font in some.
Mostly for East Asian languages, and only when the page does not set font -
and most pages do, no matter what we think about that.
Most fonts I've seen that are used for English will also contain all
the glyphs needed for French anyway.
Well, yes, and I would expect any browser default font to contain all French
characters, anyway.
I don't know if aural renderers use it to influence choice of speech
synthesizer.
Some of them use, at least optionally. But in fact, considering the web as a
whole, good algorithmic language guessing (from the content) generally
produces better results. There are so many non-English pages incorrectly
marked up as English, due to misunderstandings or, most often, due to web
authoring software defaults.

--
Yucca, http://www.cs.tut.fi/~jkorpela/

Oct 22 '08 #12
On Wed, 22 Oct 2008, Jukka K. Korpela wrote:
Some of them use, at least optionally. But in fact, considering the web as a
whole, good algorithmic language guessing (from the content) generally
produces better results.
And in most contexts there are not many languages to choose from. I am now
transferring data to another CMS, and I use the simple algorithm "if there
are twice as many "the" (as single words) than "der", the language is
"en", otherwise "de". It is *much* more reliable than trusting the
language explicitly specified by the authors in the old CMS.

--
Helmut Richter
Oct 22 '08 #13
In article
<Pi******************************@lxhri01.lrz.lr z-muenchen.de>,
Helmut Richter <hh***@web.dewrote:
On Wed, 22 Oct 2008, Jukka K. Korpela wrote:
Some of them use, at least optionally. But in fact, considering the web as a
whole, good algorithmic language guessing (from the content) generally
produces better results.

And in most contexts there are not many languages to choose from. I am now
transferring data to another CMS, and I use the simple algorithm "if there
are twice as many "the" (as single words) than "der", the language is
"en", otherwise "de". It is *much* more reliable than trusting the
language explicitly specified by the authors in the old CMS.
So what everyone seems to be saying is that there isn't much practical
point in specifying page language, except (i) if it requires a particular
character set (which is specified separately), (ii) if it differs from
left-to-right direction (which is specified separately), and/or (iii) to
be nice?
Oct 23 '08 #14
On Thu, 23 Oct 2008, David Stone wrote:
User-Agent: MT-NewsWatcher/3.5.2 (PPC Mac OS X)
When you write about such a subject, you should at least set up
your newsreader properly so that it can post and display non-English
characters correctly:
http://www.smfr.org/mtnw/docs/Mime.h...sage_with_MIME
http://www.smfr.org/mtnw/docs/TextEncoding.html

Euro sign: €
Cent sign: ˘
So what everyone seems to be saying is that there isn't much practical
point in specifying page language,
Not for English and French - but for other languages. Have a look
at http://www.unics.uni-hannover.de/nht...-attribute.htm
with Firefox and do some experience with the font settings in
your browser.

--
In memoriam Alan J. Flavell
http://www.alanflavell.org.uk/charset/
Oct 23 '08 #15
On Thu, 23 Oct 2008, I wrote:
Not for English and French - but for other languages. Have a look
at http://www.unics.uni-hannover.de/nht...-attribute.htm
with Firefox and do some experience with the font settings in
"some experiments" of course <sigh>
Oct 23 '08 #16
On Wed, 22 Oct 2008, Stefan Ram wrote:
In this case, one might even use Google's new attribute value:

<q><span lang="fr" class="notranslate">chef</span></q>

http://googlewebmastercentral.blogsp...e-barrier.html
Very good! Thank you for the hint.
Even <TBODY class="notranslate" is recognized.
Some browsers like Internet Explorer tend to ignore TBODY.

--
In memoriam Alan J. Flavell
http://www.alanflavell.org.uk/charset/
Oct 23 '08 #17
On 2008-10-23, David Stone <no******@domain.invalidwrote:
In article
<Pi******************************@lxhri01.lrz.l rz-muenchen.de>,
Helmut Richter <hh***@web.dewrote:
>On Wed, 22 Oct 2008, Jukka K. Korpela wrote:
Some of them use, at least optionally. But in fact, considering the web as a
whole, good algorithmic language guessing (from the content) generally
produces better results.

And in most contexts there are not many languages to choose from. I am now
transferring data to another CMS, and I use the simple algorithm "if there
are twice as many "the" (as single words) than "der", the language is
"en", otherwise "de". It is *much* more reliable than trusting the
language explicitly specified by the authors in the old CMS.

So what everyone seems to be saying is that there isn't much practical
point in specifying page language, except (i) if it requires a particular
character set (which is specified separately),
More if it requires a particular font (which is usually set explicitly
or detected separately).
(ii) if it differs from
left-to-right direction (which is specified separately), and/or (iii) to
be nice?
Oct 23 '08 #18
In article
<Pi******************************@s5b004.rrzn.un i-hannover.de>,
Andreas Prilop <pr********@trashmail.netwrote:
On Thu, 23 Oct 2008, David Stone wrote:
User-Agent: MT-NewsWatcher/3.5.2 (PPC Mac OS X)

When you write about such a subject, you should at least set up
your newsreader properly so that it can post and display non-English
characters correctly:
http://www.smfr.org/mtnw/docs/Mime.h...sage_with_MIME
http://www.smfr.org/mtnw/docs/TextEncoding.html

Euro sign: ¤
Cent sign: ˘
I've honestly never considered doing so, because I've always
avoided using characters in usenet posts that aren't in the
basic ASCII set. I'd just use "euros" and "cents" (or the
ubiquitous "c") instead.

However, I did find a "Send with MIME" option in the preferences,
so I checked it. Don't know if it will affect this reply, though.

I don't think I've ever needed to do a bilingual post (largely
because I am monolingual); the reason for this particular thread
is because I am currently responsible for a web site that has to
be in English and French. Parlais Frainglais, anyone?
Oct 23 '08 #19
On Thu, 23 Oct 2008, I wrote:
>http://googlewebmastercentral.blogsp...e-barrier.html

Very good! Thank you for the hint.
Even <TBODY class="notranslate" is recognized.
A further note:
Google translates <TT but it does not translate <CODE-
even without any class=notranslate .
This is another point for semantic markup with CODE
instead of just TT.

--
In memoriam Alan J. Flavell
http://www.alanflavell.org.uk/charset/
Oct 24 '08 #20
Andreas Prilop wrote:
Google translates <TT but it does not translate <CODE-
even without any class=notranslate .
This is another point for semantic markup with CODE
instead of just TT.
There's a logical gap here, though. Computer code may well contain comments,
which are (in theory at least) supposed to be in some human language and
understandable to speakers of that language. If <CODEimplies
non-translation, then there is no way, even with explicit markup, to specify
that comments be translated.

The page http://www.google.com/intl/en/help/faq_translation.html describes
class=notranslate but no attribute for turning translation on (inside an
element that is treated as nontranslatable). Looks like command-oriented tag
design, which even forgot to provide a way to give the opposite command.

--
Yucca, http://www.cs.tut.fi/~jkorpela/

Oct 25 '08 #21
On Sat, 25 Oct 2008, Jukka K. Korpela wrote:
The page http://www.google.com/intl/en/help/faq_translation.html
describes class=notranslate
Another observation: When I have

<table dir="ltr" lang="fr" class="notranslate">

Google will still mess around with it. On translating the page
from English to Arabic or Hebrew, Google changes the direction
of the table to right-to-left and the table is f*cked up.

--
In memoriam Alan J. Flavell
http://www.alanflavell.org.uk/charset/
Nov 10 '08 #22

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

303
17421
by: mike420 | last post by:
In the context of LATEX, some Pythonista asked what the big successes of Lisp were. I think there were at least three *big* successes. a. orbitz.com web site uses Lisp for algorithms, etc. b....
22
2286
by: bearophile | last post by:
Ville Vainio: >It's highly typical for the newbies to suggest improvements to the >language. They will usually learn that they are wrong, but the >discussion that ensues can be fruitfull anyway...
11
1553
by: Adonis | last post by:
What I do not understand, or not clear to me I should say, is how can some people regard Python as a scripting language? In particular the JAVA crowd. Unless my understanding is off, and from what...
5
3736
by: Fresh Air Rider | last post by:
Hello Could anyone please explain how I can pass more than one arguement/parameter value to a function using <asp:linkbutton> or is this a major shortfall of the language ? Consider the...
5
1792
by: Marc Violette | last post by:
<Reply-To: veejunk@sympatico.ca> Hello, I'm hoping someone can help me out here... I'm a beginner ASP.NET developper, and am trying to follow a series of exercises in the book entitled...
10
2414
by: ptass | last post by:
Hi In asp.net 2.0 an aspx files .cs file is a partial class and all works fine, however, I thought I’d be able to create another class file, call it a partial class and have that compile and...
3
2943
by: Water Cooler v2 | last post by:
Questions: 1. Can there be more than a single script block in a given HEAD tag? 2. Can there be more than a single script block in a given BODY tag? To test, I tried the following code. None...
5
2802
by: =?Utf-8?B?V2FubmFiZQ==?= | last post by:
We have a page that is loading very slow. There is not a lot of data, not a lot of users are connected at the same time and the page does not produce an error, so I am not sure where to start to...
0
1386
by: xirowei | last post by:
I try to search information from many websites, but what i can found is they only demonstrate the example with ONE ATTRIBUTE in a Cookie only. What i want is how to set more than 1 attribute in a...
0
7138
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
7355
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
7423
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
7081
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
5668
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
3225
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
3213
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
1576
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
0
447
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.