473,795 Members | 2,766 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Zero width space still unsafe?

Jukka reports on
http://www.cs.tut.fi/~jkorpela/chars/spaces.html
that Internet Explorer 6 fails on the "zero width space" U+200B ​

Is this observation still valid? For which versions of MS Windows
does it apply? Does it depend on the encoding (charset)?
I have a test page in three encodings:
http://www.unics.uni-hannover.de/nht...temp/zwsp.html
http://www.unics.uni-hannover.de/nht...mp/zwsp.html11
http://www.unics.uni-hannover.de/nhtcapri/temp/zwsp.tis
After each letter "z" there is a "zero width space". Do you see
an empty box instead? The correct browser behaviour would be
to allow a line break after "zero width space".
http://validator.w3.org does not recognize ISO-8859-11.
Why not?

Jul 23 '05
28 9082
Thai is unusual in that it uses spaces between sentences, but no spaces
within sentences.

Breaking between words is done by some combination of a dictionary and
an algorithm that can recognise where a word ends (don't ask me for
details, I am not a programmer). This requires support from the
operating system. Pre-Unicode, there was a special Thai edition of
Windows. With Unicode, Thai support is built in to Windows (though not
necessarily installed by default).

Applications need to use the OS' support for Thai in order to break
between words. This works in recent browsers and in Word for Windows.
It does not work in Word 2004 because Microsoft have not yet made use
of the Thai support in Mac OS X 10.3.

--
Alan Wood
http://www.alanwood.net (Unicode, special characters, pesticide names)

Jul 23 '05 #21
On Wed, 22 Dec 2004, Jukka K. Korpela wrote:
But on my system at least (Win98, with Tahoma probably as shipped with
Windows), Tahoma does not contain U+200B. Instead, a square is
displayed.
That's why fonts have a version number, too :-) The character set of
Tahoma has been enlarged with every Windows version. The version that
comes with Windows XP/2003 covers all extended Arabic characters and
is therefore well suited for all languages that use the Arabic script.
However, the context was browsing of the Thai writing system,


I'm afraid I have missed that part of the discussion.


Yes, it was hidden in personal e-mail between Alan and me :-)
Magic? I was able to reduce this to
foo<span dir="rtl">*</span>bar
and the same trick works for ​ as well.

^^^^
Did you mean ZWSP ​ or ZWJ * ?

What about
foo<span dir="rtl"></span>bar
foo*bar
?

--
Mars, unlike Earth, has no atmosphere.
The Chicago manual of style, 15th ed., p. 362

Jul 23 '05 #22
On Thu, 23 Dec 2004, Andreas Prilop wrote:
Jukka:
I'm afraid I have missed that part of the discussion.


Yes, it was hidden in personal e-mail between Alan and me :-)


Not entirely: there had been mentions of iso-8859-11 and Thai on
this thread too, although I'm not blaming Jukka for missing it.
Jul 23 '05 #23
Andreas Prilop <nh******@rrz n-user.uni-hannover.de> wrote:
Magic? I was able to reduce this to
foo<span dir="rtl">*</span>bar and the same trick works for
​ as well. ^^^^
Did you mean ZWSP ​ or ZWJ * ?


I meant ZWSP as I wrote. As far as I understand, ZWJ is a way to
_prevent_ line breaks.
What about
foo<span dir="rtl"></span>bar
Interesting idea (maybe the magic _is_ just in the dir attribute), but
IE seems to completely ignore the span element (as it should) and treat
the above as just
foobar
foo*bar
?


That was among the alternatives I tested, and there * doesn't
work as it should; instead I see roughly
foo|bar
i.e. a bar-like symbol in place of the special character. Adding <span>
markup without dir attribute does not change this. So it seems that the
magic is in the interaction between that attribute and the special
character.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 23 '05 #24
On Thu, 23 Dec 2004, Jukka K. Korpela wrote:
Magic? I was able to reduce this to
foo<span dir="rtl">*</span>bar and the same trick works for
​ as well. ^^^^
Did you mean ZWSP ​ or ZWJ * ?


I meant ZWSP as I wrote.


But didn't you write earlier that ​ is displayed as an
empty box?
As far as I understand, ZWJ is a way to _prevent_ line breaks.


No, no! ZWJ and ZWNJ have nothing to do with line breaks.
At least, they shall not; they control the shape of Arabic glyphs.

A preliminary document is here:
http://www.unics.uni-hannover.de/nhtcapri/zwnj.html

--
Mars, unlike Earth, has no atmosphere.
The Chicago manual of style, 15th ed., p. 362

Jul 23 '05 #25
Andreas Prilop <nh******@rrz n-user.uni-hannover.de> wrote:
But didn't you write earlier that ​ is displayed as an
empty box?
Yes, and using <span dir="rtl">​</span> prevents that.
As far as I understand, ZWJ is a way to _prevent_ line breaks.


No, no! ZWJ and ZWNJ have nothing to do with line breaks.


(My point above was that I didn't consider ZWJ since it prevents line
breaks instead of permitting them.)

Well, ZWJ _does_ prevent line breaks and ZWNJ allows line breaks where
they wouldn't otherwise be allowed, don't they? They have line breaking
behavior, even if the reason for their existence might be something
different.

MS Word (even Word 2003) seems to generate ZWNJ when I select a line
breaking hint from the Insert/Character/Special characters menu.
This might reflect some older idea of using ZWNJ for such purposes.
And a casual Web author might get the same idea, e.g. because HTML has
&zwnj; (and &zwj;) but not &zwsp;.
At least, they shall not; they control the shape of Arabic glyphs.


Or joining behavior in general, don't they?

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 23 '05 #26
On Thu, 23 Dec 2004, Jukka K. Korpela wrote:
No, no! ZWJ and ZWNJ have nothing to do with line breaks.
Well, ZWJ _does_ prevent line breaks and ZWNJ allows line breaks where
they wouldn't otherwise be allowed, don't they?


No! I did refer you already to
http://www.unics.uni-hannover.de/nhtcapri/zwnj.html
which shows (among other things) that &zwnj; may be part of a
Persian word. Breaking after or before &zwnj; is not acceptable!
They have line breaking
behavior, even if the reason for their existence might be something
different.
I don't know what you mean by "line breaking behavior". Perhaps you
just mean IE's (broken) behaviour. Please refer to
http://www.unicode.org/reports/tr14/#Table1
http://www.unicode.org/Public/4.0-Up...reak-4.0.0.txt
Line breaking before and after U+200C, U+200D is prohibited.
MS Word (even Word 2003) seems to generate ZWNJ when I select a line
breaking hint from the Insert/Character/Special characters menu.


You just demonstrate (again) that Microsoft's programs are broken
as designed.
Jul 23 '05 #27
Andreas Prilop <nh******@rrz n-user.uni-hannover.de> wrote:
I don't know what you mean by "line breaking behavior".
Sorry for my confusion.
Perhaps you
just mean IE's (broken) behaviour.
Well, I guess I mainly confused ZWJ and ZWNJ with zero-width spaces.
Please refer to
http://www.unicode.org/reports/tr14/#Table1
http://www.unicode.org/Public/4.0-Up...reak-4.0.0.txt
I stand corrected, but...
Line breaking before and after U+200C, U+200D is prohibited.


....as far as I can see, they are in line breaking class CM, which means
that a line break before the character is prohibited, whereas a line
break after it may or may not be allowed, depending on the next
character.
MS Word (even Word 2003) seems to generate ZWNJ when I select a
line breaking hint from the Insert/Character/Special characters
menu.


You just demonstrate (again) that Microsoft's programs are broken
as designed.


Well, it surely looks _very_ odd now, and might explain some of my
difficulties as a book author (when I had tried to help the layout
process with such hints - which might cause serious trouble when
porting data from MS Word to a publishing program).

Luckily IE does not treat &zwnj; that way. But if you use "Save As Web
page" in MS Word, it actually generates * (= &zwnj;) from a line
breaking hint, as I mentioned, so Microsoft programs aren't quite
compatible even with other Microsoft programs. (This is really not such
a buig surprise.)

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 23 '05 #28
On Tue, 4 Jan 2005, Jukka K. Korpela wrote:
Line breaking before and after U+200C, U+200D is prohibited.


...as far as I can see, they are in line breaking class CM, which means
that a line break before the character is prohibited, whereas a line
break after it may or may not be allowed, depending on the next
character.


Yes - I tacitly assumed that there are ordinary letters (class AL)
before and after U+200C, U+200D as in my examples.

Jul 23 '05 #29

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
7750
by: Stephen Weatherly | last post by:
Could anyone please help me with a problem I am having with my table widths??? If I have 2 images within a td tag, but using CSS relative positioning I position one over the top of the second (I am placing a transparent gif over the top of a normal gif) then the width of my table is large enough to accommodate both images side by side My browsers (both IE ad Firefox) do not appear take into account the relative positioning of my...
1
2391
by: pmgriffin | last post by:
Hi all, i'm attempting to overcome the problem of table cell contents overflowing by inserting zero-width spaces into the content string. The scheme is working correctly in that i know longer receive overflow errors. Unfortunately the zero-width spaces are being rendered as visible white space such that www.urlname.com becomes www. urlname. com. My xsl try to replace the '.' with '.&#x200B;' which should give me the url without the...
9
3945
by: web1110 | last post by:
Hi y'all, I have resized the columns in a DataGrid and I want to set the width of the DataGrid to fit the columns. Just summing the column widths is too short due to the grid and gray row selection column on the left. I have the widths of the columns. What other values do I need to include in the DataGrid width? Thanx,
3
2400
by: Ali Sahin | last post by:
Hi there, I'd like to transform a XML-File to PDF. The XML-File ist build like followed: <?xml version="1.0" encoding="UTF-8" standalone="yes" ?> <?xml-stylesheet type="text/xsl" href="D:\app\jboss-3.2.5\server\default\deploy\xifs.war\WEB-INF\classes\de\xifs\resource\xml\de\xifs\resource\xml\dunningaccountreport_de.xsl"?> <!DOCTYPE entities >
50
6083
by: Shadow Lynx | last post by:
Consider this simple HTML: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 STRICT//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" > <head> <title>Strict kills my widths!</title> </head> <body> <table style="width:400px; table-layout:fixed;">
6
28101
by: Hacking Bear | last post by:
Hi, I still don't quite fully understand how to handle mixing border/margin pixel width with percentage width. In the example below, I want to place side-by-side two DIV boxes inside a box. 1. Each box takes up 50% of the parent. 2. One of the box has a border width of 1px.
5
3482
by: GarryJones | last post by:
I have code numbers in 2 fields from a table which correspond to month and date. (Month, Code number) Field name = ml_mna 1 2 3 etc up to 12 (Data is entered without a leading zero)
1
1577
by: dmitry sychov | last post by:
Hello, There should be no red lines - and Mozilla does not display them but IE does... <inputshould take the whole space of its parent element (<div>) Any workaround? <br><br><br><br>
8
2887
by: rodeored | last post by:
page: http://reenie.org/test/blockpadding.htm code:<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"> <html> <head> <title>test</title> </head> <body> <h2 style='padding:0; border:solid 1px red;'>A bunch of text</h2> </body> </html>
0
9672
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10436
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10163
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10000
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9040
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7538
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5436
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
2
3722
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2920
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.