
August 16th, 2005, 12:55 PM
| | | Bidirectional text, browsers behaving badly
In an overview of book titles in many languages, which I wrote about in
another thread, I've run across a problem with some titles in scripts
that are written from right to left, like Arabic and Hebrew.
The page in question is served as utf-8. Ideally, the Unicode BiDi
algorithm should take care of any right-to-left pieces of text. It
works well, except where a title starts with a number or punctuation,
which don't have an inherent directionality and therefore take the
directionality of the parent element they occur in.
According to http://www.w3.org/International/arti...-markup/#where,
inserting a right-to-left mark should do the trick. Alternatively, you
could use extra markup with a dir attribute.
I put together a little test page at http://www.phys.uu.nl/~gdevries/test/bidi/testbidi.html. It shows a
table with titles in three languages. For each language, the titles are
listed in a <ul>. The Hebrew title starting with "20,000" is the one
causing the problem [1].
The only solution that works in all the browsers I tested in (Opera
7.54, IE6, FF) is to add an extra <cite> around each of the titles.
Using only a rtl mark doesn't work in Opera, preceding that mark by a
zero-width space solves the problem in Opera, but upsets IE.
Could anyone offer their comments? Any ideas on why Opera can't handle
the rtl mark correctly?
[1] BTW, the Turkish titles don't actually refer to "Vingt mille lieues
sous les mers", but to "Deux ans de vacances"; I included these to make
clear that one *language* can be written in various *scripts*, and that
indication of the directionality, when explicitly given, must be given
at the title level, not for all titles in one language together.
Garmt de Vries. | 
August 16th, 2005, 01:55 PM
| | | Re: Bidirectional text, browsers behaving badly
"Garmt de Vries" <gdv1000@hotmail.com> wrote in message
news:1124192707.493265.175960@g49g2000cwa.googlegr oups.com...[color=blue]
> In an overview of book titles in many languages, which I wrote about in
> another thread, I've run across a problem with some titles in scripts
> that are written from right to left, like Arabic and Hebrew.[/color]
[color=blue]
> I put together a little test page at
> http://www.phys.uu.nl/~gdevries/test/bidi/testbidi.html. It shows a
> table with titles in three languages. For each language, the titles are
> listed in a <ul>. The Hebrew title starting with "20,000" is the one
> causing the problem [1].[/color]
[color=blue]
> The only solution that works in all the browsers I tested in (Opera
> 7.54, IE6, FF)[/color]
Solution #5 works fine in Opera 8.02
A cursory Google shows that Opera7 had poor RTL support and earlier versions
had none. | 
August 16th, 2005, 03:05 PM
| | | Re: Bidirectional text, browsers behaving badly
On 16 Aug 2005, Garmt de Vries wrote:
[color=blue]
> The page in question is served as utf-8. Ideally, the Unicode BiDi[/color]
I wonder how you come to the spelling "BiDi" with capital "D".
You might write "Bidi" or "BIDI"; but "BiDi" isn't justified.
[color=blue]
> algorithm should take care of any right-to-left pieces of text.[/color]
The algorithm itself refers to certain Unicode control characters.
It is a common misbelieve that the bidirectional algorithm does
everything without control characters for you. On the contrary,
you are required to use Unicode control characters in text/plain
for certain constructions. The HTML equivalent for these control
characters is markup with the DIR attribute.
[color=blue]
> According to
> http://www.w3.org/International/arti...-markup/#where,
> inserting a right-to-left mark should do the trick. Alternatively, you
> could use extra markup with a dir attribute.[/color]
Right. Inserting ‎ ‏ is easier but HTML markup is
safer and more suitable for the _markup_ language HTML.
[color=blue]
> I put together a little test page at
> http://www.phys.uu.nl/~gdevries/test/bidi/testbidi.html.
> "20,000"[/color]
This means 20.000 = 20.
If you mean twenty thousand, then write 20 000
(ISO 31, ISO 1000)
[color=blue]
> Could anyone offer their comments?[/color]
Carefully read http://ppewww.ph.gla.ac.uk/~flavell/...direction.html
and closely inspect the examples at http://ppewww.ph.gla.ac.uk/~flavell/...ir-sample.html http://www.unics.uni-hannover.de/nht...onal-text.html
[color=blue]
> Any ideas on why Opera can't handle the rtl mark correctly?[/color]
Opera (and other browsers) are broken in their bidi display.
Mozilla-based browsers and Internet Explorer 6 work best.
Where Mozilla and IE 6 differ, Mozilla is (as usual) right. | 
August 17th, 2005, 11:45 AM
| | | Re: Bidirectional text, browsers behaving badly
Andreas Prilop wrote:
[color=blue]
> I wonder how you come to the spelling "BiDi" with capital "D".[/color]
Slip of the keyboard...
[color=blue]
> Inserting ‎ ‏ is easier but HTML markup is
> safer and more suitable for the _markup_ language HTML.[/color]
It's just a bit heavier kB-wise. In my case, I keep all the titles (in
78 languages, with hundreds of titles for some languages, in a
database. A RTL mark is easily added to those few titles that need it
(i.e. Arabic or Hebrew titles starting with a number). But adding
markup like <cite> (which would make sense semantically) would add
several hundreds of <cite></cite>s to the page, depending on which
language(s) the user wants to see displayed. Also, I would have to keep
track of directionality for all the titles. It's doable, of course, but
is it the smartest thing to do?
[color=blue][color=green]
> > "20,000"[/color]
>
> This means 20.000 = 20.
> If you mean twenty thousand, then write 20 000
> (ISO 31, ISO 1000)[/color]
I quote the titles exactly how they are used in real life. So if the
book has "20,000" on its cover, I put "20,000" in my list.
BTW, wouldn't you use a space that's a bit narrower than ?
[color=blue]
> Opera (and other browsers) are broken in their bidi display.[/color]
It's a pity...
I guess the safest way to go for now is to use markup. Even if it's a
bit heavier, it has a larger probability to give the intended result,
at least until all browsers have caught up with bidi requirements. | 
August 17th, 2005, 03:05 PM
| | | Re: Bidirectional text, browsers behaving badly
On 17 Aug 2005, Garmt de Vries wrote:
[color=blue][color=green]
>> Inserting ‎ ‏ is easier but HTML markup is
>> safer and more suitable for the _markup_ language HTML.[/color]
>
> It's just a bit heavier kB-wise. In my case, I keep all the titles (in
> 78 languages, with hundreds of titles for some languages, in a
> database.[/color]
But you need the DIR markup only for a few languages. And even if
you included "DIR=LTR" everywhere, your images will still be larger.
[color=blue]
> But adding
> markup like <cite> (which would make sense semantically) would add
> several hundreds of <cite></cite>s to the page, depending on which
> language(s) the user wants to see displayed.[/color]
I think your markup should be <UL DIR=RTL>.
Whether you use CITE for titles is a different question.
[color=blue]
> I quote the titles exactly how they are used in real life. So if the
> book has "20,000" on its cover, I put "20,000" in my list.
> BTW, wouldn't you use a space that's a bit narrower than ?[/color]
Of course. Feel free to write <small> </small>
or to take any of these: http://www.cs.tut.fi/~jkorpela/chars/spaces.html
BTW:
Your markup <COL LANG=...> isn't useful because Mozilla doesn't
recognize it. Put the LANG attribute into other tags: TD, UL, etc.
Mozilla will then take the user-defined typeface for this
language: http://ppewww.ph.gla.ac.uk/~flavell/...s.html#Mozilla | 
August 18th, 2005, 09:45 AM
| | | Re: Bidirectional text, browsers behaving badly
Andreas Prilop wrote:[color=blue]
> On 17 Aug 2005, Garmt de Vries wrote:[color=green]
> > But adding
> > markup like <cite> (which would make sense semantically) would add
> > several hundreds of <cite></cite>s to the page, depending on which
> > language(s) the user wants to see displayed.[/color]
>
> I think your markup should be <UL DIR=RTL>.
> Whether you use CITE for titles is a different question.[/color]
Setting dir on the <ul> isn't possible, since one <ul> can contain both
rtl and ltr titles (in Turkish for example). I could set the dir for
each <li> separately, but if there is a note following the title, I'd
like it to appear on the right of the title. Having ltr on the entire
<li> screws that up. So it seems the dir attribute should really be set
on the title only, leaving me a choice between <cite> and <span>.
[color=blue]
> Your markup <COL LANG=...> isn't useful because Mozilla doesn't
> recognize it. Put the LANG attribute into other tags: TD, UL, etc.
> Mozilla will then take the user-defined typeface for this
> language:
> http://ppewww.ph.gla.ac.uk/~flavell/...s.html#Mozilla[/color]
This has been discussed in another thread. You're right that the lang
on a col isn't used by most (if any) of the current browsers, but
having an extra lang=".." on each and every <td> seems too
cumbersome... I'll have a look at the page you quote, seems useful. | 
August 18th, 2005, 02:45 PM
| | | Re: Bidirectional text, browsers behaving badly
On 18 Aug 2005, Garmt de Vries wrote:
[color=blue]
> Setting dir on the <ul> isn't possible, since one <ul> can contain both
> rtl and ltr titles (in Turkish for example).[/color]
This looks awful to me. I would rather make separate lists
"Turkish in Latin letters" and "Turkish in Arabic letters".
[color=blue]
> This has been discussed in another thread. You're right that the lang
> on a col isn't used by most (if any) of the current browsers,[/color]
You might as well omit the LANG attribute altogether then.
Currently, the only _real_ use of LANG is to convince Mozilla
to display the corresponding element in a suitable typeface.
But Mozilla does not recognize <COL LANG=...> IIRC. | 
August 19th, 2005, 09:55 AM
| | | Re: Bidirectional text, browsers behaving badly
Andreas Prilop wrote:[color=blue]
> On 18 Aug 2005, Garmt de Vries wrote:
>[color=green]
> > Setting dir on the <ul> isn't possible, since one <ul> can contain both
> > rtl and ltr titles (in Turkish for example).[/color]
>
> This looks awful to me. I would rather make separate lists
> "Turkish in Latin letters" and "Turkish in Arabic letters".[/color]
Do you mean it looks awful in a cosmetic way? In that case, you can see
the real thing at work at http://www.phys.uu.nl/~gdevries/languages/.
There are some issues in the interface that I want to sort out, but the
layout of the table is more or less final. If something looks bad, I'd
be glad to hear it.
Or don't like the Latin and Arabic Turkish titles being given together?
In that case, I'd have to disagree with you. Whether a title is written
in Latin, Arabic or Armenian script, the *language* is still Turkish,
and it's *languages* that I want to show on this page.
[color=blue]
> You might as well omit the LANG attribute altogether then.[/color]
That was our conclusion as well. It's easy to let it in, and I'll keep
it just for the sake of good semantics, and if future browsers pick up
on it, so much the better. But I wouldn't like cluttering the markup
with a lang attribute on every single <td>... It's a shame, really,
that this is so badly supported. | 
August 22nd, 2005, 03:45 PM
| | | Re: Bidirectional text, browsers behaving badly
On 19 Aug 2005, Garmt de Vries wrote:
[color=blue]
> the real thing at work at http://www.phys.uu.nl/~gdevries/languages/[/color]
| <style type="text/css">
| <!--
Your STYLE is not a comment, is it? So remove the comment markers. | 
August 23rd, 2005, 10:55 AM
| | | Re: Bidirectional text, browsers behaving badly
On Mon, 22 Aug 2005, Andreas Prilop wrote:
[color=blue]
> On 19 Aug 2005, Garmt de Vries wrote:
>[color=green]
> > the real thing at work at http://www.phys.uu.nl/~gdevries/languages/[/color]
>
> | <style type="text/css">
> | <!--
>
> Your STYLE is not a comment, is it? So remove the comment markers.[/color]
In HTML/4.* as in HTML/3.2, the content model of "style" and of "script"
elements is CDATA, so those character sequences aren't parsed as comments
anyway.
In XHTML/1.0, on the other hand, they *are* supposed to be parsed as
comments, meaning that what's contained between them is meant to be
ignored.
When I visited the cited URL, I found that what I was being sent was
supposed to be strict XHTML/1.0, being sent (server HTTP header) with a
content type of text/html (i.e the context where "Appendix C" should be
applied). By the time that I looked, there were no "comment markers"
around the in-lined style - but I saw at least one example of javascript
where they were present.
This suggests that whatever is emitting this page doesn't understand
XHTML, and would be better advised to stay with HTML/4.01 until it does.
Unless the intention is to sabotage the "Nedstat Basic code" where this
stuff was found :-} | 
August 24th, 2005, 12:15 PM
| | | Re: Bidirectional text, browsers behaving badly
Alan J. Flavell wrote:[color=blue]
> By the time that I looked, there were no "comment markers"
> around the in-lined style - but I saw at least one example of javascript
> where they were present.[/color]
You're right. That's a remnant of an ancient version of that page,
which I had completely forgotten about...
Garmt de Vries. |
Posting Rules
| You may not post new threads You may not post replies You may not post attachments You may not edit your posts HTML code is Off | | | | | | What is Bytes?
We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights.
Get the best answers to your questions from over network members.
|