By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,656 Members | 1,327 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,656 IT Pros & Developers. It's quick & easy.

Bidirectional control (formatting) characters

P: n/a
Unicode Technical Report #20 http://www.unicode.org/reports/tr20/
considers bidirectional control (formatting) characters as
"not suitable for use with markup".
This has sometimes been questioned.

But I agree with UTR #20 because the bidi controls
would interfere with left-to-right markup when editing
or even viewing the source text.

Consider an example. I want to write

[]

with in italics and in bold.
The source text looks fine both with HTML markup and with
bidi control characters written as numeric references:

<span dir="rtl"><B lang="he"></b[<I></i>]</span>
‫<B lang="he"></b[<I></i>]‬

However, when I insert the bidi control characters directly,
the result is a mess when viewing the source text:

[<i/><b[<I/><"B lang="he>

A bidi-conforming browser would still display the HTML page
correctly (as intended) - but who is going to edit such a mess?

See this example on
http://www.unics.uni-hannover.de/nht...onal-text.html

Aug 16 '06 #1
Share this Question
Share on Google+
9 Replies


P: n/a
Andreas Prilop wrote:
Unicode Technical Report #20 http://www.unicode.org/reports/tr20/
considers bidirectional control (formatting) characters as
"not suitable for use with markup".
This has sometimes been questioned.

But I agree with UTR #20 because the bidi controls
would interfere with left-to-right markup when editing
or even viewing the source text.

Consider an example. I want to write

[]

with in italics and in bold.
Note: Thunderbird says you sent this as 8859-8, but has the order of the
characters messed up. It shows, reading from left to right: left
bracket; shin; bet; tav; space; left bracket; shin; alef; bet; ayin; samekh.

Then, when I hit Reply, your quoted material showed up in my response in
the correct order! In other words, the Yiddish loshen-koydesh word
"Shabbes" followed by its phonetic rendering in square brackets,
appeared in correct right-to-left sequence. Strangely, when I check
which encoding Thunderbird intends to use to send this message, none of
them are checked.

OK, I just selected 8859-8 from the list of available options--and the
letters immediately shifted into the same sequence I saw in your post.
Likewise when I choose UTF-8. In fact, now I can't identify any encoding
that restores the correct sequence. So I wonder what Thunderbird was
doing when it got the order right.

For the sake of this experiment, I'm going to send this with 8859-8
selected. Then I'm going to send another reply without selecting any
encoding.
The source text looks fine both with HTML markup and with
bidi control characters written as numeric references:

<span dir="rtl"><B lang="he"></b[<I></i>]</span>
‫<B lang="he"></b[<I></i>]‬

However, when I insert the bidi control characters directly,
the result is a mess when viewing the source text:

[<i/><b[<I/><"B lang="he>

A bidi-conforming browser would still display the HTML page
correctly (as intended) - but who is going to edit such a mess?

See this example on
http://www.unics.uni-hannover.de/nht...onal-text.html
Aug 16 '06 #2

P: n/a
Andreas Prilop wrote:
Unicode Technical Report #20 http://www.unicode.org/reports/tr20/
considers bidirectional control (formatting) characters as
"not suitable for use with markup".
This has sometimes been questioned.

But I agree with UTR #20 because the bidi controls
would interfere with left-to-right markup when editing
or even viewing the source text.

Consider an example. I want to write

[]

with in italics and in bold.
The source text looks fine both with HTML markup and with
bidi control characters written as numeric references:
I'm sending this response without selecting any encoding, unlike the
last one, where I selected 8859-8. At the moment, the Yiddish text
appears to be in the proper sequence.
Aug 16 '06 #3

P: n/a
Harlan Messinger wrote:
Andreas Prilop wrote:
>Unicode Technical Report #20 http://www.unicode.org/reports/tr20/
considers bidirectional control (formatting) characters as
"not suitable for use with markup".
This has sometimes been questioned.

But I agree with UTR #20 because the bidi controls
would interfere with left-to-right markup when editing
or even viewing the source text.

Consider an example. I want to write

[]

with in italics and in bold.
The source text looks fine both with HTML markup and with
bidi control characters written as numeric references:

I'm sending this response without selecting any encoding, unlike the
last one, where I selected 8859-8. At the moment, the Yiddish text
appears to be in the proper sequence.
OK, I've now read both my original responses, and both have the Yiddish
in the wrong order. As for the character encoding that Thunderbird is
showing for each, let me first mention a detail: When I had said that I
was sending the first message as 8859-8, more specifically that was
listed as "Hebrew (ISO-8859-8-I)". That's also how Thunderbird is
reading my first reply. As for my second reply, Thunderbird says it's
"Hebrew Visual (ISO-8859-8)". That's the first time I've seen Hebrew
Visual in the encoding list.
Aug 16 '06 #4

P: n/a
On Wed, 16 Aug 2006, Harlan Messinger wrote:
Note: Thunderbird says you sent this as 8859-8, but has the order of the
characters messed up. It shows, reading from left to right: left
bracket; shin; bet; tav; space; left bracket; shin; alef; bet; ayin; samekh.
[...]
So I wonder what Thunderbird was
doing when it got the order right.
I don't know what Thunderbird is doing here. I wrote my message for
a reason in "Visual Hebrew ISO-8859-8". There is *no* bidirectional
reordering involved here - everything should go straight from
left to right. So "Hebrew Visual" should, in theory, be suited
for discussing source texts.

Although Google Groups sadly ignores the "charset" parameter
of messages, we can still use it: Go to

http://google.com/group/comp.infosys...?oe=ISO-8859-1

and then choose manually the encoding

Visual Hebrew ISO-8859-8

in your browser.
Note that it is *not* "ISO-8859-8-I" and *not* "Windows-1255".

Aug 17 '06 #5

P: n/a
On Thu, 17 Aug 2006, I wrote:
I don't know what Thunderbird is doing here. I wrote my message for
a reason in "Visual Hebrew ISO-8859-8".
Here is another one in UTF-8.

This line has HTML markup:

<span dir="rtl"><B lang="he">שבת</b[<I>שאבעס</i>]</span>

This line has control characters written as numeric references:

‫<B lang="he">שבת</b[<I>שאבעס</i>]‬

This line has UTF-8-encoded control characters:

‫<B lang="he">שבת</b[<I>שאבעס</i>]‬
Or view the source text of
http://www.unics.uni-hannover.de/nht.../controls.html
where you should see that all three lines produce the same
HTML page view.

Aug 17 '06 #6

P: n/a
Andreas Prilop wrote:
On Wed, 16 Aug 2006, Harlan Messinger wrote:
>Note: Thunderbird says you sent this as 8859-8, but has the order of the
characters messed up. It shows, reading from left to right: left
bracket; shin; bet; tav; space; left bracket; shin; alef; bet; ayin; samekh.
[...]
So I wonder what Thunderbird was
doing when it got the order right.

I don't know what Thunderbird is doing here. I wrote my message for
a reason in "Visual Hebrew ISO-8859-8". There is *no* bidirectional
reordering involved here - everything should go straight from
left to right. So "Hebrew Visual" should, in theory, be suited
for discussing source texts.
Wait--so you *meant* the Yiddish letters to be displayed left-to-right?
In that case there's no problem, besides the fact that when I hit Reply
they were displayed right-to-left until I selected a character encoding.
(This is the first time I'm learning of "Visual Hebrew". I don't know
what that's supposed to mean.)
Aug 17 '06 #7

P: n/a
Andreas Prilop wrote:
Although Google Groups sadly ignores the "charset" parameter
of messages, we can still use it: Go to

http://google.com/group/comp.infosys...?oe=ISO-8859-1

and then choose manually the encoding

Visual Hebrew ISO-8859-8

in your browser.
Note that it is *not* "ISO-8859-8-I" and *not* "Windows-1255".
In Firefox: with Visual Hebrew, the Yiddish is displayed right-to-left.
With regular ISO-8859-8-I Hebrew, it's left-to-right.

Aug 17 '06 #8

P: n/a
On Thu, 17 Aug 2006, Harlan Messinger wrote:
Wait--so you *meant* the Yiddish letters to be displayed left-to-right?
The letter "shin", being the first letter in both spellings,
should appear as the *right*most letter - in my Usenet postings
as well as in my web pages.
http://www.unics.uni-hannover.de/nht.../controls.html
http://www.unics.uni-hannover.de/nht...onal-text.html
http://google.com/group/comp.infosys...2c03c973cbe76a

Aug 17 '06 #9

P: n/a
On Thu, 17 Aug 2006, Harlan Messinger wrote:
(This is the first time I'm learning of "Visual Hebrew". I don't know
what that's supposed to mean.)
Hebrew written backwards, i.e. left-to-right.
The *display* however is as usual, i.e. the first letter is
on the the right.
This may be useful for text/plain only; you should not use
"Visual Hebrew" on the web for text/html.
http://ppewww.ph.gla.ac.uk/~flavell/...direction.html

However, I didn't know that the support for "Visual Hebrew"
seems so poor in newsreaders. You don't have to implement
anything of the bidirectional algorithm: all characters
should go straight from left to right.

Read more at
http://www.nirdagan.com/hebrew/
http://www.nirdagan.com/hebrew/compare

Aug 17 '06 #10

This discussion thread is closed

Replies have been disabled for this discussion.