473,795 Members | 3,157 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Zero width space still unsafe?

Jukka reports on
http://www.cs.tut.fi/~jkorpela/chars/spaces.html
that Internet Explorer 6 fails on the "zero width space" U+200B ​

Is this observation still valid? For which versions of MS Windows
does it apply? Does it depend on the encoding (charset)?
I have a test page in three encodings:
http://www.unics.uni-hannover.de/nht...temp/zwsp.html
http://www.unics.uni-hannover.de/nht...mp/zwsp.html11
http://www.unics.uni-hannover.de/nhtcapri/temp/zwsp.tis
After each letter "z" there is a "zero width space". Do you see
an empty box instead? The correct browser behaviour would be
to allow a line break after "zero width space".
http://validator.w3.org does not recognize ISO-8859-11.
Why not?

Jul 23 '05
28 9083
Andreas Prilop <nh******@rrz n-user.uni-hannover.de> wrote:
Jukka reports on
http://www.cs.tut.fi/~jkorpela/chars/spaces.html
that Internet Explorer 6 fails on the "zero width space" U+200B


.... in "normal" conditions, yes. By "normal" I mean that the font used
is not Arial Unicode MS or Lucida Sans Unicode (or some special font).

It seems to me that the behavior mostly depends on fonts, which in turn
depend on many things. If an author style sheet suggests
font-family: Arial Unicode MS, Lucida Sans Unicode;
then I would say that the great majority of users would see the
document rendered properly in this respect. But such settings may have
drawbacks.

The problem, as I understand it, is this:
- IE 6 (and even IE 4 and IE 5) knows the basic property of U+200B that
a line break is permitted after it
- however it does not know that it has zero width so that the browser
need not render anything for it
- so it uses whatever the font in use has for the character
- and it fails to scan through the available fonts to pick up one that
contains a glyph for the character.

So my practical conclusion is that U+200B is not ready for prime time,
and if it is important to suggest permissible line breaks in a long
string, the nonstandard <wbr> is still the practical solution.

For some additional notes, see
http://www.cs.tut.fi/~jkorpela/html/nobr.html#zwsp
where I mention that the HTML 4.01 specification explicitly leaves the
rendering of ZWSP (as one of the white space characters for which
rendering is _not_ defined) explicitly undefined.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 23 '05 #11
On Mon, 20 Dec 2004 21:23:36 +0000, "Alan J. Flavell"
<fl*****@ph.gla .ac.uk> wrote:
On Mon, 20 Dec 2004, Jim Ley wrote: [...]
Is it perhaps font related?

Could well be - I'm afraid my understanding of Windows internals
is quite lacking...
The real basic fact is that there is no one single person that knows how
Windows is supposed to work today, not even within MS themselves.

That come as a result of "outsourcin g" for coding works. Most parts of
MS products are today produced in so called low cost countries, India,
Russia, China and every other country that is willing to sell the souls
of their people just to get the money in.

For quite some time back it's all about the money, and protection of the
"monopoly". Heck, MS is in a "full control" position of just about every
hard disk producing company in the world. Proved by the fact that it is
cheaper to buy a new HD with Win-something pre installed than it is to
get the same drive all blank from the start :-)
- most of what I think I've grasped has been done by experimenting.


So have we all, but the target keeps moving around :-)

Allow me to predict (as based on last days "experimenting" ) that, given
the right tool, every and all Win NT/XP user can find at least a 1000
dead entries in his registry data base.

The "registry database" is just another con played on MS users that made
it possible for MS to hide away all the basic idiocy that is buried in
that OP-system.

From what I have found it looks like a garbage dump for both MS and
other applications that gets installed in the Win environment.

I'm pretty sure that this (ab)usage of the "registry database" was not
an original idea of Dave Cutler.

--
Rex
Jul 23 '05 #12
On Mon, 20 Dec 2004, Jukka K. Korpela wrote:
It seems to me that the behavior mostly depends on fonts, which in turn
depend on many things. If an author style sheet suggests
font-family: Arial Unicode MS, Lucida Sans Unicode;
then I would say that the great majority of users would see the
document rendered properly in this respect. But such settings may have
drawbacks.
I believe that Tahoma is likely to rate better than L.S.U in this
regard, whereas we shouldn't assume that most people have A.U.MS.

Whereas, if they have a font that's well tuned to their writing
system, then telling MSIE to use any of the above will be a
disservice to them. It's a difficult choice to have to make.
So my practical conclusion is that U+200B is not ready for prime time,
In general I'd have to agree with you. However, the context was
browsing of the Thai writing system, so one might presume that anyone
interested in that would be willing to equip themselves with an
appropriate font and browser settings. The fact that it'll make a
hopeless mess for the rest of us is neither here nor there, since we
can't read it anyway. IMHO and YMMV...
and if it is important to suggest permissible line breaks in a long
string, the nonstandard <wbr> is still the practical solution.
I don't know why that cited Thai page claims that this non-standard
<wbr> is no longer working (for some practical value of the term
"working" ;-)

Mind you, the marker could just as well be <foobar> or <secam>, for
all that most browsers seem to care. Or <x> if you prefer less typing
;-)
For some additional notes, see
http://www.cs.tut.fi/~jkorpela/html/nobr.html#zwsp
where I mention that the HTML 4.01 specification explicitly leaves the
rendering of ZWSP (as one of the white space characters for which
rendering is _not_ defined) explicitly undefined.


Possibly; but there are hints elsewhere that browsers are expected to
apply appropriate typography for the writing system in use, and
Thai evidently needs this, so it's still on the agenda for browser
implementers, no matter that HTML doesn't demand it in so many words.

Jul 23 '05 #13
On Mon, 20 Dec 2004 23:55:56 +0100, Jan Roland Eriksson
<jr****@newsguy .com> wrote:
That come as a result of "outsourcin g" for coding works. Most parts of
MS products are today produced in so called low cost countries, India,
Russia, China and every other country that is willing to sell the souls
of their people just to get the money in.
Good, I'm very, very glad that they're using low cost developers,
almost all the problems I've seen with outsourcing has been because of
poor management by the western countries, not low cost developers. It
certainly makes sense for them.
Heck, MS is in a "full control" position of just about every
hard disk producing company in the world. Proved by the fact that it is
cheaper to buy a new HD with Win-something pre installed than it is to
get the same drive all blank from the start :-)
Could you tell me where I get to buy these hard disks? I've never
even seen a hard disk for sale with an operating system on it.
Allow me to predict (as based on last days "experimenting" ) that, given
the right tool, every and all Win NT/XP user can find at least a 1000
dead entries in his registry data base.


I think there's a good chance that any computer user could find 1000
dead lines of config data.

Jim.
--
comp.lang.javas cript FAQ - http://jibbering.com/faq/

Jul 23 '05 #14
On Mon, 20 Dec 2004, Henri Sivonen wrote:
http://www.unics.uni-hannover.de/nhtcapri/temp/zwsp.tis
After each letter "z" there is a "zero width space". Do you see
an empty box instead?


I see a box in Firefox (trunk) on OS X.


Firefox (Solaris 9) does not display a box - it shows only the letters
and breaks, if necessary, after "z".

The MacThai character set includes the zero width space:
http://www.unicode.org/Public/MAPPIN...APPLE/THAI.TXT
If you don't mind, you might (temporarily) install Thai language
support and see what happens.

I regard the "zero width space" not as a graphic character, but as
a control character like "newline" or "zero width joiner". There's
nothing to display with these characters. What's the point of including
glyphs for "newline" or "zero width space" in a font? Consider a
program that wouldn't do a newline when the font has no glyph for it!
A bit stupid. There's something wrong with programs when they insist
of displaying certain glyphs for the control characters "newline" or
"zero width space".

The mystery is:
How are existing Thai pages written?

Jul 23 '05 #15
In article <Pine.GSO.4.44. 0412211523310.1 2191-100000@s5b003>,
Andreas Prilop <nh******@rrz n-user.uni-hannover.de> wrote:
On Mon, 20 Dec 2004, Henri Sivonen wrote:
http://www.unics.uni-hannover.de/nhtcapri/temp/zwsp.tis
After each letter "z" there is a "zero width space". Do you see
an empty box instead?


I see a box in Firefox (trunk) on OS X.


Firefox (Solaris 9) does not display a box - it shows only the letters
and breaks, if necessary, after "z".

The MacThai character set includes the zero width space:
http://www.unicode.org/Public/MAPPIN...APPLE/THAI.TXT
If you don't mind, you might (temporarily) install Thai language
support and see what happens.


I already have "fonts for additional languages" installed and the Thai
input methods are selectable.

Thai display in Gecko on OS X is broken:
https://bugzilla.mozilla.org/show_bug.cgi?id=225217

In general, Gecko on OS X will continue to be broken for many languages
until the gfx is migrated to ATSUI. I'm not holding my breath.
https://bugzilla.mozilla.org/show_bug.cgi?id=atsui

--
Henri Sivonen
hs******@iki.fi
http://iki.fi/hsivonen/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html
Jul 23 '05 #16
Andreas Prilop <nh******@rrz n-user.uni-hannover.de> wrote:
I regard the "zero width space" not as a graphic character, but as
a control character like "newline" or "zero width joiner".
That's a reasonable idea, but Unicode defines it as "separator, space".
There's nothing to display with these characters.
By definition, zero width space has no width but may get expanded in
formatting.

I'd say it's dual: printable _and _control character, in the same sense
as the Ascii space is.
What's the point of
including glyphs for "newline" or "zero width space" in a font?
Regarding "newline", depends on what you mean. A program that
cannot handle Ascii CR and LF is probably so broken that nothing helps.
But the _preferred_ line separator in Unicode is LINE SEPARATOR U+2028,
and support to it in programs is fairly limited. Similar considerations
apply to ZERO WIDTH SPACE: programs might fail to recognize it in any
particular meaning but just try to render it. For such situations, a
fallback, in the form of a glyph shape, would be useful. For zero width
space, an empty zero-width glyph is appopriate. LS is a different issue
(maybe it _should_ look like a special symbol that someone indicates
line separation).
There's something wrong with programs
when they insist of displaying certain glyphs for the control
characters "newline" or "zero width space".


The don't have adequate Unicode support, but who has?

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 23 '05 #17
On Mon, 20 Dec 2004 23:41:24 GMT, ji*@jibbering.c om (Jim Ley) wrote:
On Mon, 20 Dec 2004 23:55:56 +0100, Jan Roland Eriksson
<jr****@newsgu y.com> wrote:
That come as a result of "outsourcin g" for coding works. Most parts of
MS products are today produced in so called low cost countries...
Good, I'm very, very glad that they're using low cost developers,
almost all the problems I've seen with outsourcing has been because of
poor management by the western countries, not low cost developers.
It certainly makes sense for them.


It did not mean to imply that low cost developers are doing a bad job,
on the contrary in most cases.

But it's my experience from some 25 years in industrial automation that
the problems of creating a good final product is proportional to the
square of the distance between the point of management and the point of
production. It's not only "poor management" but lots of other criteria's
that comes into this, cultural differences not to be forgotten.
Heck, MS is in a "full control" position of just about every
hard disk producing company in the world. Proved by the fact that it is
cheaper to buy a new HD with Win-something pre installed than it is to
get the same drive all blank from the start :-)


Could you tell me where I get to buy these hard disks? I've never
even seen a hard disk for sale with an operating system on it.


The computer store in the same block where I live could be a good start.
Their arguments for selling pre installed Win drives is that it's
cheaper and I can always go on to reformat the drive myself if I need it
blank.

Sweden has for numbers of years been regarded as being the most Win
populated per capita country in the world. There are political reasons
for this, e.g. private PC's can be had as tax deductible units through
ones own employer. That may have something to do with status of the HD
market here too.

--
Rex [nuf OT for now]
Jul 23 '05 #18
"Alan J. Flavell" <fl*****@ph.gla .ac.uk> wrote:
On Mon, 20 Dec 2004, Jukka K. Korpela wrote:
It seems to me that the behavior mostly depends on fonts, which in
turn depend on many things. If an author style sheet suggests
font-family: Arial Unicode MS, Lucida Sans Unicode;
then I would say that the great majority of users would see the
document rendered properly in this respect. But such settings may
have drawbacks.
I believe that Tahoma is likely to rate better than L.S.U in this
regard, whereas we shouldn't assume that most people have A.U.MS.


But on my system at least (Win98, with Tahoma probably as shipped with
Windows), Tahoma does not contain U+200B. Instead, a square is
displayed.
Whereas, if they have a font that's well tuned to their writing
system, then telling MSIE to use any of the above will be a
disservice to them. It's a difficult choice to have to make.
Indeed. But at least people using MSIE would see the data (assuming the
author has correctly identified the font(s) he suggests so that each of
them contains all the glyphs needed).
In general I'd have to agree with you. However, the context was
browsing of the Thai writing system, so one might presume that
anyone interested in that would be willing to equip themselves with
an appropriate font and browser settings.
I'm afraid I have missed that part of the discussion. Surely for some
specific purposes, we need to make some fair assumptions about the
potential audience.
I don't know why that cited Thai page claims that this non-standard
<wbr> is no longer working (for some practical value of the term
"working" ;-)
Perhaps because Nescape dropped support in some version(s) - but soon
restored it.
Mind you, the marker could just as well be <foobar> or <secam>, for
all that most browsers seem to care. Or <x> if you prefer less
typing ;-)


Do you think so? In my test, foo<foobar>bar gets treated the same way
as foobar.

But now it's time for a really weird observation.

I used MS Word 2000 and inserted (via Insert/Chararacter) a special
character for line break hints (sorry, I just assume they call it that
way in the English version - that's my back-translation), which turns
out to be U+200C ZERO-WIDTH NON-JOINER at least when I save as HTML,
i.e. I get *. Now that's not ZWSP, though similar. But wait...
The HTML that Word spits out contains

<p class=MsoNormal ><span lang=FI>foo</span><span dir=RTL></span><span
lang=AR-SA dir=RTL>*</span><span lang=FI>bar<spa n style=
'letter-spacing:3.0pt'> <o:p></o:p></span></span></p>

and while this monstrous, it "works" in the sense that there is no box
or bar in place of the special character; instead it works as an
invisible character that permits a simple line break - _even if_ the
font used does not contain that character.

Magic? I was able to reduce this to
foo<span dir="rtl">*</span>bar
and the same trick works for ​ as well.

Can we declare this an official hack? :-) And should it be more
"semantic", with bdo instead of span?

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 23 '05 #19
On Wed, 22 Dec 2004, Jukka K. Korpela wrote:
But on my system at least (Win98, with Tahoma probably as shipped with
Windows), Tahoma does not contain U+200B. Instead, a square is
displayed.
Thus confirming what I keep saying to others, that the name of a font
is no guarantee of its character repertoire, in general.
Mind you, the marker could just as well be <foobar> or <secam>, for
all that most browsers seem to care. Or <x> if you prefer less
typing ;-)


Do you think so?


Not any longer - sorry! I'm sure I tested this, but it may have been
some years back. My apologies for posting that without checking!!
But now it's time for a really weird observation. [...] Magic? I was able to reduce this to
foo<span dir="rtl">*</span>bar
and the same trick works for ​ as well.

Can we declare this an official hack? :-)
Bizarre. How many other browsers do we have to try it in before
we can confidently recommend it...?
And should it be more "semantic", with bdo instead of span?


I'll save that question for later, if I may ;-)
Jul 23 '05 #20

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
7750
by: Stephen Weatherly | last post by:
Could anyone please help me with a problem I am having with my table widths??? If I have 2 images within a td tag, but using CSS relative positioning I position one over the top of the second (I am placing a transparent gif over the top of a normal gif) then the width of my table is large enough to accommodate both images side by side My browsers (both IE ad Firefox) do not appear take into account the relative positioning of my...
1
2391
by: pmgriffin | last post by:
Hi all, i'm attempting to overcome the problem of table cell contents overflowing by inserting zero-width spaces into the content string. The scheme is working correctly in that i know longer receive overflow errors. Unfortunately the zero-width spaces are being rendered as visible white space such that www.urlname.com becomes www. urlname. com. My xsl try to replace the '.' with '.&#x200B;' which should give me the url without the...
9
3945
by: web1110 | last post by:
Hi y'all, I have resized the columns in a DataGrid and I want to set the width of the DataGrid to fit the columns. Just summing the column widths is too short due to the grid and gray row selection column on the left. I have the widths of the columns. What other values do I need to include in the DataGrid width? Thanx,
3
2400
by: Ali Sahin | last post by:
Hi there, I'd like to transform a XML-File to PDF. The XML-File ist build like followed: <?xml version="1.0" encoding="UTF-8" standalone="yes" ?> <?xml-stylesheet type="text/xsl" href="D:\app\jboss-3.2.5\server\default\deploy\xifs.war\WEB-INF\classes\de\xifs\resource\xml\de\xifs\resource\xml\dunningaccountreport_de.xsl"?> <!DOCTYPE entities >
50
6083
by: Shadow Lynx | last post by:
Consider this simple HTML: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 STRICT//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" > <head> <title>Strict kills my widths!</title> </head> <body> <table style="width:400px; table-layout:fixed;">
6
28101
by: Hacking Bear | last post by:
Hi, I still don't quite fully understand how to handle mixing border/margin pixel width with percentage width. In the example below, I want to place side-by-side two DIV boxes inside a box. 1. Each box takes up 50% of the parent. 2. One of the box has a border width of 1px.
5
3482
by: GarryJones | last post by:
I have code numbers in 2 fields from a table which correspond to month and date. (Month, Code number) Field name = ml_mna 1 2 3 etc up to 12 (Data is entered without a leading zero)
1
1578
by: dmitry sychov | last post by:
Hello, There should be no red lines - and Mozilla does not display them but IE does... <inputshould take the whole space of its parent element (<div>) Any workaround? <br><br><br><br>
8
2888
by: rodeored | last post by:
page: http://reenie.org/test/blockpadding.htm code:<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"> <html> <head> <title>test</title> </head> <body> <h2 style='padding:0; border:solid 1px red;'>A bunch of text</h2> </body> </html>
0
9672
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9519
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10001
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7538
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6780
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5437
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
4113
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3723
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2920
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.