By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,139 Members | 1,937 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,139 IT Pros & Developers. It's quick & easy.

non-breaking hyphen

P: n/a
Hi. I found the following when trying to learn if there is such a thing as a
non-breaking hyphen. Apparently Unicode has a ‑ but that is not
well-supported, especially in older browsers. Somebody somewhere said:

Alternately, you can use CSS to declare a class having:

..nowrap { white-space:nowrap }

.... and then wrap the compound word in a <span class=nowrap></span> tag (or
any other suitable inline tag). You can also try { white-space:pre } ...

I wasn't sure where to post this, because part of the question is about the
character entity that apparently is NOT defined in html? However, what about
the CSS idea for non-wrapping? On one of my pages
www.TheBicyclingGuitarist.net/newstuff.htm I give credit to some folks at
comp.infosystems.www.authoring.site-design. I want the hyphen in between
site and design to be a non-breaking one.

Chris Watson a.k.a. "The Bicycling Guitarist"

Jul 23 '05 #1
Share this Question
Share on Google+
27 Replies


P: n/a
On Fri, 12 Nov 2004 18:19:08 GMT, The Bicycling Guitarist
<Ch***@TheBicyclingGuitarist.net> wrote:
Hi. I found the following when trying to learn if there is such a thing
as a
non-breaking hyphen. Apparently Unicode has a ‑ but that is not
well-supported, especially in older browsers. Somebody somewhere said:

Alternately, you can use CSS to declare a class having:

.nowrap { white-space:nowrap }


You can do that. There are other solutions. I use <nobr>the stuff that has
to stay in one line</nobr>. Sure, the W3C-validator keeps warning that
<nobr> is not a valid element in HTML4.01. If you get bugged by this
nagging by the validator, just write your own DTD. I did. The validator is
very nice to me now ;-) .

--
PretLetters <http://home.wanadoo.nl/b.de.zoete/>
Webontwerp <http://home.wanadoo.nl/b.de.zoete/html/webontwerp.html>
Zweefvliegen <http://home.wanadoo.nl/b.de.zoete/html/vliegen.html>
DTD <http://home.wanadoo.nl/b.de.zoete/dtd/not_so_strict.dtd>
Jul 23 '05 #2

P: n/a
On Fri, 12 Nov 2004 18:19:08 GMT, "The Bicycling Guitarist"
<Ch***@TheBicyclingGuitarist.net> wrote:
Hi. I found the following when trying to learn if there is such a thing as a
non-breaking hyphen. Apparently Unicode has a ‑ but that is not
well-supported, especially in older browsers. Somebody somewhere said:

Alternately, you can use CSS to declare a class having:

.nowrap { white-space:nowrap }

... and then wrap the compound word in a <span class=nowrap></span> tag (or
any other suitable inline tag). You can also try { white-space:pre } ...

I wasn't sure where to post this, because part of the question is about the
character entity that apparently is NOT defined in html? However, what about
the CSS idea for non-wrapping? On one of my pages
www.TheBicyclingGuitarist.net/newstuff.htm I give credit to some folks at
comp.infosystems.www.authoring.site-design. I want the hyphen in between
site and design to be a non-breaking one.


Using CSS white-space:nowrap seems to work well in browsers that are at
all recent. There is however a small confusion factor: the CSS2 spec
says that white-space only applies to block-level elements. For
white-space:pre this is sensible, for white-space:nowrap it is stupid.
But the browser writers seem to have decided to ignore that line of the
spec and do something sensible. (AFAIK)

--
Stephen Poley

http://www.xs4all.nl/~sbpoley/webmatters/
Jul 23 '05 #3

P: n/a
Stephen Poley <sb******************@xs4all.nl> wrote:
Using CSS white-space:nowrap seems to work well in browsers that are at
all recent.
Yes, when CSS is enabled.
There is however a small confusion factor: the CSS2 spec
says that white-space only applies to block-level elements.


It has long been said that this was a mistake and they probably didn't
mean to say so. But due to the processes of the W3C, the W3C
recommendation is currently the CSS2 spec but the W3C really _means_ that
everyone should use the CSS 2.1 draft ("Proposed Recommendation").

The wording of the meaning of the property is vague too, especially if
you think about the name white-space. Hyphens aren't white space of
course.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 23 '05 #4

P: n/a
"The Bicycling Guitarist" <Ch***@TheBicyclingGuitarist.net> wrote:
- - part of the question is
about the character entity that apparently is NOT defined in html?


The construct ‑ is certainly defined in HTML, as denoting the
character NON-BREAKING HYPHEN U+2011. (It's not a character entity but a
character reference. The terminology seems to be permanently confused,
though.)

But the HTML specifications are very vague about the required processing
of characters. Surely U+2011 has definite semantics in Unicode, but are
HTML user agents required to observe all the semantics of Unicode
characters? The HTML specifications says, oddly:
"In HTML, there are two types of hyphens: the plain hyphen and the soft
hyphen."
http://www.w3.org/TR/html4/struct/text.html#hyphenation

Later it specifies that "plain hyphen" means U+002D, or the well-known
"Ascii hyphen", or HYPHEN-MINUS to use the Unicode name. This character
has, by Unicode definition, ambiguous semantics, as its official Unicode
name suggests.

So apparently the real hyphen, HYPHEN U+2010, is _not_ a hyphen in HTML,
and neither is the non-breaking hyphen. Right? Or maybe not. Maybe the
people who wrote the HTML specification simply didn't think much about
character semantics in general. They just wrote a short, sketchy, and
misleading description that revolves around the soft hyphen (and mentions
"plain hyphen" just for contrast).

Hence, the fact that few browsers support the non-breaking hyphen cannot
really be regarded as bug. Moreover, it's basically a _font_ issue. To
satisfy minimal requirements in processing (as far as rendering HTML
documents goes), a browser simply has to display the character, treating
it as a normal graphic character (as opposite to its eventual treatment
of "plain hyphen" as allowing a line break after it).

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 23 '05 #5

P: n/a
On Fri, 12 Nov 2004, Jukka K. Korpela wrote:
The construct ‑ is certainly defined in HTML, as denoting the
character NON-BREAKING HYPHEN U+2011. (It's not a character entity
but a character reference. The terminology seems to be permanently
confused, though.)
You noticed that too...
But the HTML specifications are very vague about the required
processing of characters. Surely U+2011 has definite semantics in
Unicode, but are HTML user agents required to observe all the
semantics of Unicode characters?
I think the answer to that is "not entirely". There are some Unicode
rules that are intended to be effective in plain text, but don't have
any obvious applicability in the source "code" for a markup language,
for example.
The HTML specifications says, oddly:
"In HTML, there are two types of hyphens: the plain hyphen and the soft
hyphen."
http://www.w3.org/TR/html4/struct/text.html#hyphenation
This goes right back to RFC1866, doesn't it? And has not been taken
seriously by browser implementers.
Later it specifies that "plain hyphen" means U+002D, or the well-known
"Ascii hyphen", or HYPHEN-MINUS to use the Unicode name. This character
has, by Unicode definition, ambiguous semantics, as its official Unicode
name suggests.
RFC2070 referred to iso-10646 rather than Unicode as such, but
developments have really made that distinction rather irrelevant. At
least, we might hope that they have.
So apparently the real hyphen, HYPHEN U+2010, is _not_ a hyphen in HTML,
and neither is the non-breaking hyphen. Right? Or maybe not. Maybe the
people who wrote the HTML specification simply didn't think much about
character semantics in general.
You could put it that way...
Hence, the fact that few browsers support the non-breaking hyphen
cannot really be regarded as bug.


And, conversely, that authors cannot really *rely* on browser
behaviour in this regard.

Jul 23 '05 #6

P: n/a
Alan J. Flavell (fl*****@ph.gla.ac.uk) wrote:
: On Fri, 12 Nov 2004, Jukka K. Korpela wrote:

: > The construct ‑ is certainly defined in HTML, as denoting the
: > character NON-BREAKING HYPHEN U+2011. (It's not a character entity
: > but a character reference. The terminology seems to be permanently
: > confused, though.)

: You noticed that too...

: > But the HTML specifications are very vague about the required
: > processing of characters. Surely U+2011 has definite semantics in
: > Unicode, but are HTML user agents required to observe all the
: > semantics of Unicode characters?

: I think the answer to that is "not entirely". There are some Unicode
: rules that are intended to be effective in plain text, but don't have
: any obvious applicability in the source "code" for a markup language,
: for example.

Surely they don't have any applicability in any text except as the
application chooses them to have applicability.

If you use vi to create a unicode readme file that contains arabic
characters then would a programmer cat'ing the file to the console expect
to see the arabic going left to right or right to left? (Assuming the
console knew how to handle the character-encoding/character-set in the
first place).

If an html file containing mostly english text includes a paragraph that
uses characters in the range U+0600 through U+06FF (arabic) then should
the browser be "smart enough" to display this right to left?

Somehow, having the browser recognize different characters as being
different kinds of hyphens and therefore formatting the text differently,
instead of requiring markup to tell it what to do, seems wrong to me but
that's just my spur of the moment $0.02 worth.

Jul 23 '05 #7

P: n/a
Barbara de Zoete wrote:
There are other solutions. I use <nobr>the stuff that
has to stay in one line</nobr>. Sure, the W3C-validator keeps warning
that <nobr> is not a valid element in HTML4.01. If you get bugged by
this nagging by the validator, just write your own DTD. I did. The
validator is very nice to me now ;-).


You should *never* use <nobr> under any circumstances in HTML, even if
you write your own DTD. It is a presentational element, and thus has no
place in HTML. It is also a proprietary element created by either
Netscape or IE (I can't remember which), which makes it even worse.

You should use semantic class names that represent what the content is,
not how it looks. When writing markup, you have to think about *why*
the content needs to have no breaks, or other presentational features,
not what it looks like. For example, from The Bicycling Guitarist's
page, where (s)he would like the it to not wrap, in the following:

comp.infosystems.www.authoring.site-design

Usually, I would recommend using the U+2011 (‑, &#x2011; or ‑),
though apparently support in older browsers is an issue, but more
importantly, because it is the name of a news group which uses a
HYPHEN-MINUS, and that name needs to be understood if it were copied and
pasted into a news reader. Thus, the use of a non-breaking hyphen is
actually incorrect. I tested this, and wrote that newsgroup using a
non-breaking hyphen, and then copied to my newsreader. The group could
not be found, but once I chaged it back to a regular hyphen it worked.

Therefore, in this case, it appears that the use of a non-breaking
hyphen is infact presentational, not semantic, so I would markup that up
like this:

<a href="news:comp.infosystems.www.authoring.site-design"
class="news">comp.infosystems.www.authoring.site-design</a>

That has the advantage of also providing a link that a user can use to
subscribe to the newsgroup.

Because it is also code that can be used by a news reader to subscribe
to that newsgroup, then you could also markup it up in <code>, or even
<kbd> depending on the context, but <kbd> would only be appropriate if
you were telling a reader to enter the text into their newsreader. In
this case, <code> seems most approprate.

<code class="news">comp.infosystems.www.authoring.site-design</code>

If you like, you can also combine both the link and <code>, but only one
needs to have the class="news".

Finally, just apply the style to .news { ... }

--
Lachlan Hunt
http://lachy.id.au/
http://GetFirefox.com/ Rediscover the Web
http://SpreadFirefox.com/ Igniting the Web
Jul 23 '05 #8

P: n/a
On Sat, 13 Nov 2004 03:04:59 GMT, Lachlan Hunt <sp***********@gmail.com>
wrote:
Barbara de Zoete wrote:
There are other solutions. I use <nobr>the stuff that has to stay in
one line</nobr>. Sure, the W3C-validator keeps warning that <nobr> is
not a valid element in HTML4.01. If you get bugged by this nagging by
the validator, just write your own DTD. I did. The validator is very
nice to me now ;-).


You should *never* use <nobr> under any circumstances in HTML, even if
you write your own DTD. It is a presentational element, and thus has no
place in HTML. It is also a proprietary element created by either
Netscape or IE (I can't remember which), which makes it even worse.

You should use semantic class names that represent what the content is,
not how it looks.


I know and I don't give a damn. Sometimes one can take it too far. <nobr>
Is short, easy to remember. I see no reason not to use it. No, anyone
telling me it is presentational and or a proprietary element is not a
reason either.
Sometimes I just go with wat is practical. <nobr> or <span
class="nobreak">? I prefer the first. I hate CSS-soup as much as I hate
tag-soup.

--
PretLetters <http://home.wanadoo.nl/b.de.zoete/>
Webontwerp <http://home.wanadoo.nl/b.de.zoete/html/webontwerp.html>
Zweefvliegen <http://home.wanadoo.nl/b.de.zoete/html/vliegen.html>
DTD <http://home.wanadoo.nl/b.de.zoete/dtd/not_so_strict.dtd>
Jul 23 '05 #9

P: n/a
Barbara de Zoete wrote:
On Sat, 13 Nov 2004 03:04:59 GMT, Lachlan Hunt
<sp***********@gmail.com> wrote:
You should *never* use <nobr> under any circumstances in HTML

You should use semantic class names that represent what the content
is, not how it looks.
I know and I don't give a damn.


Well, keep your bad habbits to yourself. Don't advise anyone else to
use them.
Sometimes one can take it too far.
<nobr> Is short, easy to remember. I see no reason not to use it. No,
anyone telling me it is presentational and or a proprietary element is
not a reason either.
Sometimes I just go with wat is practical. <nobr> or <span
class="nobreak">? I prefer the first.
class="nobreak" is just as bad as <nobr> in my opinion, it says nothing
about *why* it is being styled like that. The only difference is that
at least the span is valid, so I would always choose that over nobr, but
since they're rarely, if ever, the only options, I would always use
something more semantic.
I hate CSS-soup as much as I hate tag-soup.


Using <nobr> is tag soup, so that's just being hypocritical. Although
I've never heard of CSS-soup before, but I'm guessing, if you didn't
just make it up, that it would refer to CSS filled with presentational
class names and IDs (Which I've seen plenty of), rather than styles for
semantic elements, classes and IDs, which is i how stylesheets should be
done.

--
Lachlan Hunt
http://lachy.id.au/
http://GetFirefox.com/ Rediscover the Web
http://SpreadFirefox.com/ Igniting the Web
Jul 23 '05 #10

P: n/a
On Sat, 13 Nov 2004 08:59:47 GMT, Lachlan Hunt <sp***********@gmail.com>
wrote:
Barbara de Zoete wrote:
On Sat, 13 Nov 2004 03:04:59 GMT, Lachlan Hunt
<sp***********@gmail.com> wrote:
You should *never* use <nobr> under any circumstances in HTML

You should use semantic class names that represent what the content
is, not how it looks. I know and I don't give a damn.


Well, keep your bad habbits to yourself. Don't advise anyone else to
use them.


You can say what you want to say. I can say what I want to say. You can't
keep me from saying what I have to say. That is not how it works in
newsgroups.
Sometimes one can take it too far. <nobr> Is short, easy to remember.
I see no reason not to use it. No, anyone telling me it is
presentational and or a proprietary element is not a reason either.
Sometimes I just go with wat is practical. <nobr> or <span
class="nobreak">? I prefer the first.


class="nobreak" is just as bad as <nobr> in my opinion, it says nothing
about *why* it is being styled like that.


The trouble is that if you add the [white-space:nowrap] (hmpf, who ever
came up with white-space in the first place?) to meaningful classes, like
..name for personal names that you don't want to break to more lines, you
have to repeat it over and over again. If you put it in a class .nobreak,
you can 'recycle' this class and use it with various other ID's classes
and selectors. Keeps the size of the style sheet down a bit.
The only difference is that at least the span is valid,
But that's the point: so is my <nobr> :-)
Validating is a good habit to see if you didn't make any mistakes in your
markup, but I think it is overrated for anything else. It doesn't do
anything for you. The browser couldn't care less if you use <nobr>, but it
would like you to close this element. So validation for checking if there
is a closing tag for all not empty elements is fine. As is checking for
correct use of [alt] with <img> for example.
so I would always choose that over nobr, but since they're rarely, if
ever, the only options, I would always use something more semantic.
I can understand that. Still, I don't see a big difference between <nobr>
and <strong> or <em>. <nobr> Indicates that the content inside that
element should stay together, semantically. Not just because 'it looks
nice'.
If you are reading out loud, breath in before you start with the part
contained in <nobr>, because you don't get to stop for a breath again
before the </nobr>. To me <nobr> is thus semantical and not presentational.
I think some of the elements that were done with in HTML strict are chosen
some what arbitrarily. Like: why keep the <b> and <i> and ditch the <u>?
(Don't answer that; it's rethorical. I've seen the discussion before) Why
implement the <strong> and <em> and ban the <nobr>.

I would like an element like <name> to exist. I use the <nobr> with names
mainly. Personal names, addresses. You don't want a break between 's and
Gravenhage. Semantically you don't want that. You don't stop for a breath
between Lachlan and Hunt either. They go together. If there would be more
elements like <name> I probably wouldn't use <nobr>. Now all I tell my
browser is 'hey, this stuff belongs together. Don't break it up.' And I
don't get to tell why. But then again, I don't get to tell why something
should have <em> either.
I hate CSS-soup as much as I hate tag-soup.


Using <nobr> is tag soup,


No. It is an HTML element. That doesn't make using it tag soup. Especially
since I think it is semantical.
so that's just being hypocritical. Although I've never heard of
CSS-soup before, but I'm guessing, if you didn't just make it up,
No, I didn't make it up. There are more newsgroups that those just in the
ciwa* hierarchy, you know.
that it would refer to CSS filled with presentational class names and
IDs (Which I've seen plenty of),
It referes (IIRC) to a page where the actual markup has been replaced with
loads of div's with fancy class names like 'main_header' :-) Adding the
occasional <span class="bold"> will help creating CSS-soup.
rather than styles for semantic elements, classes and IDs, which is i
how stylesheets should be done.


Oh, but don't you see. I do agree with that. I love CSS and am trying to
expand my knowlegde on CSS continuously. Still, sometimes I just think:
no, not being able to do this (anymore) is wrong. Like the <nobr> example.
I do think <nobr> is intended to be as much a semantical element as it is
a presentational one (like <h1> and <h2> are intended to be semantical,
but their effect is also presentational - in graphical browsers their
font-size varies).
--
PretLetters <http://home.wanadoo.nl/b.de.zoete/>
Webontwerp <http://home.wanadoo.nl/b.de.zoete/html/webontwerp.html>
Zweefvliegen <http://home.wanadoo.nl/b.de.zoete/html/vliegen.html>
DTD <http://home.wanadoo.nl/b.de.zoete/dtd/not_so_strict.dtd>
Jul 23 '05 #11

P: n/a
On Sat, 12 Nov 2004, Malcolm Dew-Jones wrote:
: I think the answer to that is "not entirely". There are some
Unicode : rules that are intended to be effective in plain text, but
don't have : any obvious applicability in the source "code" for a
markup language, : for example.

Surely they don't have any applicability in any text except as the
application chooses them to have applicability.
That looks like a tautology to me!
If you use vi to create a unicode readme file
But the "application" here is HTML, and the rules of HTML apply.
Let's not get side-tracked by a discussion of plain-text editors.
Client agents (browsers and other kinds of client) aren't supposed to
make up their own rules on any of the matters which the relevant
interworking specifications have codified. In places where the
specification says that Unicode semantics apply, that's should be the
end of the matter.
If an html file containing mostly english text includes a paragraph
that uses characters in the range U+0600 through U+06FF (arabic)
then should the browser be "smart enough" to display this right to
left?
http://www.w3.org/TR/html401/struct/dirlang.html#h-8.2

If a document contains right-to-left characters, and if the user
agent displays these characters, the user agent must use the
bidirectional algorithm.

Slightly curious wording: the browser doesn't -have- to be capable of
displaying rtl characters, but if it is, then it -must- implement the
bidi algorithm.
Somehow, having the browser recognize different characters as being
different kinds of hyphens and therefore formatting the text
differently, instead of requiring markup to tell it what to do,
seems wrong to me


Hmmm. HTML markup isn't (primarily) supposed to tell anything "what
to do", but rather to declare to the client agents "what kind of thing
this is", so that they can do whatever is appropriate with that thing.

The few instances where HTML does seem to tell a client agent what to
do (e.g <br>, <hr>, <pre>) are considered by some to be anomalies in
the original definition of HTML.

all the best
Jul 23 '05 #12

P: n/a
Lachlan Hunt <sp***********@gmail.com> wrote:
You should *never* use <nobr> under any circumstances in HTML
Here we go again. This has been debated a few times on different fora,
including the www-html list, and it seems that the W3C approach favors
the use of Unicode special characters. So instead of saying
<nobr>a/b</nobr>, we're supposed to use a/⁠b or the equivalent
using UTF-8 encoded characters themselves. In addition to being an odd
and clumsy way of doing a simple thing, it basically doesn't work on
current browsers. Besides, it delegates markup tasks to _lower_ level,
namely character level.

What you seem to be saying is that we should use CSS instead, which is
more practical in the present situation, but even more awkward: we need
to invent lots of classes to please you and use
<span class="expression">a/b</span>
with
..expression { white-space: nowrap; }
You should use semantic class names that represent what the content
is, not how it looks.
<nobr>a/b</nobr> says that a/b is a unit of information where all
characters belong together. It's surely _more_ semantic than the W3C
approach which moves us down to the character level.
When writing markup, you have to think about
*why* the content needs to have no breaks
Do I? There is a huge number of situations where line breaks are
undesirable. Do I need to invent classes for all of them? The real
problem is what they call "Unicode line breaking rules", which fairly
arbitrarily allow line breaks after different characters. If we need to
class all of the expressions where we have special characters inside
strings, things get rather unnatural. Why is the Unicode consortium
allowed to make bulky decision that a line break after "/" is allowed
(except in a set of cases - the rules are _really_ messy) but we are not
allowed to simply say "no, don't break here"?
comp.infosystems.www.authoring.site-design

Usually, I would recommend using the U+2011 (‑, &#x2011; or
‑), though apparently support in older browsers is an issue,
It's not really an issue. It simply does not work on most browsers.
But in principle, the non-breaking hyphen would be a possible solution.
Not much better and not much worse than <nobr>.
but more importantly, because it is the name of a news group which
uses a HYPHEN-MINUS, and that name needs to be understood if it were
copied and pasted into a news reader.
You have a good point there. Similar considerations might even apply to
the use of &nbsp; - which is universally supported by browsers but which
still might cause problems in cut & paste operations for example, since
it is by definition a character distinct from the space character.

Using <nobr> avoids the problem. So does <span> together with
white-space: nowrap, _if_ CSS is in use, but it's the clumsier and less
structured approach.
Therefore, in this case, it appears that the use of a non-breaking
hyphen is infact presentational, not semantic,
Well, it is semantic, and semantically wrong, in the sense that the
actual name hasn't got that character.
<a href="news:comp.infosystems.www.authoring.site-design"
class="news">comp.infosystems.www.authoring.site-design</a>

That has the advantage of also providing a link that a user can use
to subscribe to the newsgroup.
Actually such links are of fairly limited usefulness for several reasons,
and linking via Google Groups using a http: URL is more practical. But I
digress.
In this case, <code> seems most approprate.


Is the name computer code? I think it's a borderline case, and I think
you are just interpreting the semantics of <code> very freely in order to
avoid the inevitable conclusion: in the great majority of cases, the real
alternative to <nobr> is <span>, which by definition lacks _all_
semantics. So you would probably want to suggest that class attributes
give some semantics to semantically empty elements.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 23 '05 #13

P: n/a
"Barbara de Zoete" <b_********@hotmail.com> wrote in
news:opshedepfdx5vgts@dunnet-q4wppud9:
On Sat, 13 Nov 2004 08:59:47 GMT, Lachlan Hunt
<sp***********@gmail.com> wrote:
Using <nobr> is tag soup,


No. It is an HTML element. That doesn't make using it tag soup.
Especially since I think it is semantical.


I wouldn't call it an _HTML_ element. It is part of a different, unnamed
language, which happens to be one many browsers understand :)
Jul 23 '05 #14

P: n/a
Jukka K. Korpela wrote:
Lachlan Hunt <sp***********@gmail.com> wrote:

You should *never* use <nobr> under any circumstances in HTML
Here we go again. This has been debated a few times on different fora,
including the www-html list, and it seems that the W3C approach favors
the use of Unicode special characters. So instead of saying
<nobr>a/b</nobr>, we're supposed to use a/⁠b or the equivalent
using UTF-8 encoded characters themselves. In addition to being an odd
and clumsy way of doing a simple thing, it basically doesn't work on
current browsers. Besides, it delegates markup tasks to _lower_ level,
namely character level.

What you seem to be saying is that we should use CSS instead, which is
more practical in the present situation, but even more awkward: we need
to invent lots of classes to please you and use
<span class="expression">a/b</span>
with
.expression { white-space: nowrap; }


Yes, that's right. I don't see a problem with that. As long as care is
taken to ensure the number of different classes does not become
unmanagable, and classes are only made for those situations where
existing element semantics aren't quite enough, it is the correct way to
markup a semantic document.
<nobr>a/b</nobr> says that a/b is a unit of information where all
characters belong together.
That sounds like your just trying to apply semantics to an element that
is defined as purely presentational.
It's surely _more_ semantic than the W3C approach which moves us down
to the character level.
It depends. Some situations may be more appropriately marked up using
elements, and others may be better left at the character level. I can't
think of any specific examples, but I'm sure there would be different
circumstances like this.
When writing markup, you have to think about
*why* the content needs to have no breaks


Do I?


Yes, that is the correct way to use a semantic markup language.
There is a huge number of situations where line breaks are
undesirable. Do I need to invent classes for all of them?
There are also a huge number of situations where I might want bold text.
Is it better to use <b> for all of them, or should I mark them up
properly by saying what each of them are?
The real problem is what they call "Unicode line breaking rules", which fairly
arbitrarily allow line breaks after different characters. If we need to
class all of the expressions where we have special characters inside
strings, things get rather unnatural. Why is the Unicode consortium
allowed to make bulky decision that a line break after "/" is allowed
(except in a set of cases - the rules are _really_ messy) but we are not
allowed to simply say "no, don't break here"?
You are allowed to say that, it just needs to be said at the
presentation level. The markup level just says what it is, from which,
it can be determined why and how to present it.
comp.infosystems.www.authoring.site-design

Usually, I would recommend using the U+2011 (‑, &#x2011; or
‑), though apparently support in older browsers is an issue,


It's not really an issue. It simply does not work on most browsers.


Well, I would call that an issue, but I was infact just referring to the
point made about using it by The Bicycling Guitarist in his/her original
question.
But in principle, the non-breaking hyphen would be a possible solution.
Not much better and not much worse than <nobr>.
In this case, I agree.
but more importantly, because it is the name of a news group which
uses a HYPHEN-MINUS, and that name needs to be understood if it were
copied and pasted into a news reader.


You have a good point there. Similar considerations might even apply to
the use of &nbsp; - which is universally supported by browsers but which
still might cause problems in cut & paste operations for example, since
it is by definition a character distinct from the space character.


Yes, it is, and should be treated as a distinct character. However from
experience, copying a non-breaking space and pasting into some, but not
all, programs, it does get treated as a regular space, so it doesn't
seem to cause as many problems.
Therefore, in this case, it appears that the use of a non-breaking
hyphen is infact presentational, not semantic,


Well, it is semantic, and semantically wrong, in the sense that the
actual name hasn't got that character.

<a href="news:comp.infosystems.www.authoring.site-design"
class="news">comp.infosystems.www.authoring.site-design</a>

That has the advantage of also providing a link that a user can use
to subscribe to the newsgroup.


Actually such links are of fairly limited usefulness for several reasons,
and linking via Google Groups using a http: URL is more practical. But I
digress.


I find the news: URIs more useful since clicking on one will
automatically launcy my newsreader for me and open up the group, but I
guess it would probably be more practical to provide both, and somehow
clearly indicate the difference to the users.

Though one should probably take the target audience of the site into
consideration for such a decision. A technically minded group of
computer users may find news: URIs easier, and/or are more likely to
understand how to gain access to a newgroup via other means if
necessary, but a non-technical group of users wouldn't, so they would
benefit most from a link to Google groups.
In this case, <code> seems most approprate.


Is the name computer code? I think it's a borderline case, and I think
you are just interpreting the semantics of <code> very freely


Yes, it was a very loose interpretation, however <code> is very loosely
defined in the spec. It just states that it is a fragment of comoputer
code, and I interpreted that very loosely as content that can be
processed in some meaningful way be a computer. In this case, it can be
processed to either subscribe or post to the newsgroup. I know it's not
exactly the best reason, nor the best explanation of my reason, and I'm
sure many would disagree it.
in order to avoid the inevitable conclusion: in the great majority of cases, the real
alternative to <nobr> is <span>, which by definition lacks _all_
semantics.
As does <nobr>, so in a sense you are correct. However, in many cases
it simply requires better use of more semantic elements and classes
where appropriate.

So you would probably want to suggest that class attributes give some semantics to semantically empty elements.


Classes can be used to give author defined semantics, even to
semantically empty elements. But, semantically empty elements should
generally be avoided where possible

--
Lachlan Hunt
http://lachy.id.au/
http://GetFirefox.com/ Rediscover the Web
http://SpreadFirefox.com/ Igniting the Web
Jul 23 '05 #15

P: n/a
Lachlan Hunt <sp***********@gmail.com> wrote:
<nobr>a/b</nobr> says that a/b is a unit of information where all
characters belong together.
That sounds like your just trying to apply semantics to an element
that is defined as purely presentational.


That's because you have already decided so. Think about
rmdir /foo
versus
rmdir / foo
Is the difference purely presentational? That's what the Unicode
consortium thinks, when it allows the first expression to be divided as
rmdir /
foo
It's surely _more_ semantic than the W3C approach which moves us
down to the character level.


It depends. Some situations may be more appropriately marked up
using elements, and others may be better left at the character level.


Moving it to character level means that presentational features have been
wired in into the document's textual content. Isn't this worse than
wiring it in into markup around the content? Things may change, of
course, if we regards line breaking issues as potentially belonging to
logical structure or semantics.
There are also a huge number of situations where I might want bold
text.
Not really. Ignoring headings, table cells and things like that and
considering inline emphasis only, the odds are that the reason for
bolding text is strong emphasis. Whether this is too coarse a concept is
an interesting question, but it corresponds to the <strong> element.
Except for a small number of special cases, <b> is just the vulgar way of
writing <strong> (and the original designers of HTML should be blamed for
this - _they_ decided to make the logical alternative's name five times
as long as the physical alternative's name).
I find the news: URIs more useful since clicking on one will
automatically launcy my newsreader for me
It won't launch any newsreader unless the browser has been configured to
use one - and this is normally _not_ handled in any default settings.
In this case, <code> seems most approprate.


Is the name computer code? I think it's a borderline case, and I
think you are just interpreting the semantics of <code> very freely


Yes, it was a very loose interpretation, however <code> is very
loosely defined in the spec.


We agree on that, though maybe for different values of "very". But the
reason for your choosing it was that you felt that you _needed_ _any_
element that you can regard as logical. That is, in an attempt to avoid
<nobr> and <span>, you would have picked up virtually anything, even an
element that you wouldn't have dreamt of otherwise.

But it's not necessarily a bad choice.
It just states that it is a fragment of
comoputer code, and I interpreted that very loosely as content that
can be processed in some meaningful way be a computer.
That would mean that anything is <code>, wouldn't it? Surely you can feed
any text into a computer and process it in some meaningful way.

But a newsgroup name could be marked up as <code> because it is "computer
code" in the sense of having been _defined_ separately for use as input
to computer software, as an identifier of a group. This becomes more
obvious, perhaps, if you think how newsgroup names often have to be
distorted from the natural language expressions that they have been
derived from, e.g. by dropping accents away.

On the practical side, some automatic translation software (BabelFish)
treats text inside <code> as a literal string that remains invariant in
translation. And this is very natural and very desirable, since if we
have, say, some text about Unix, mentioning the <code>cat</code> command,
then we don't want that "cat" to become "chat" when translating into
French.
in order to avoid the inevitable conclusion: in the great majority
of cases, the real alternative to <nobr> is <span>, which by
definition lacks _all_ semantics.


As does <nobr>, so in a sense you are correct.


No it doesn't. Even if you regard <nobr> as purely presentational,
marking something with <nobr> says _more_ than marking it with <span>.
Just as <b class="vector"> says more than <span class="vector">. The
former says, loosely speaking, 'here we have an element with undefined
meaning, but the preferred visual rendering is bold'. It does not say
what the meaning is, but it may give a hint.
Classes can be used to give author defined semantics, even to
semantically empty elements.


What author defined semantics? The class name has no meaning; it is
simply a string. The author may have something in his mind, and someone
reading the source code might get a hint if he happens to know the
natural language from which the name had been taken. But this is
different from the hint given by <b> (or by <nobr>, even if you regard it
as presentational only), as defined by the _markup language_.
Would you understand the author defined semantics of
class="lauseke" or class="korostus"?

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 23 '05 #16

P: n/a
In article <Xn*****************************@193.229.0.31>,
"Jukka K. Korpela" <jk******@cs.tut.fi> wrote:
Similar considerations might even apply to
the use of &nbsp; - which is universally supported by browsers but which
still might cause problems in cut & paste operations for example, since
it is by definition a character distinct from the space character.

Using <nobr> avoids the problem.


If having a non-breaking space is important, isn't it important to copy
it as well?

--
Henri Sivonen
hs******@iki.fi
http://iki.fi/hsivonen/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html
Jul 23 '05 #17

P: n/a
The Bicycling Guitarist wrote:

Hi. I found the following when trying to learn if there is such a thing as a
non-breaking hyphen. Apparently Unicode has a ‑ but that is not
well-supported, especially in older browsers. Somebody somewhere said:

Alternately, you can use CSS to declare a class having:

.nowrap { white-space:nowrap }

... and then wrap the compound word in a <span class=nowrap></span> tag (or
any other suitable inline tag). You can also try { white-space:pre } ...

I wasn't sure where to post this, because part of the question is about the
character entity that apparently is NOT defined in html? However, what about
the CSS idea for non-wrapping? On one of my pages
www.TheBicyclingGuitarist.net/newstuff.htm I give credit to some folks at
comp.infosystems.www.authoring.site-design. I want the hyphen in between
site and design to be a non-breaking one.


I don't understand. In Mozilla, hyphens (&minus;, ISO 8859-1
-, 0x2D) are non-breaking.

--

David E. Ross
<http://www.rossde.com/>

I use Mozilla as my Web browser because I want a browser that
complies with Web standards. See <http://www.mozilla.org/>.
Jul 23 '05 #18

P: n/a
David Ross <no****@nowhere.not> wrote:
I don't understand. In Mozilla, hyphens (&minus;, ISO 8859-1
-, 0x2D) are non-breaking.


The entity reference &minus; denotes the minus sign, which is not a
hyphen at all.

ISO 8859-1 is irrelevant here.

The character reference - denotes the hyphen-minus character
(Ascii hyphen), and 0x2D is a common way of mentioning its hexadecimal
code in several standards.

The Unicode line breaking rules allow a line break after a hyphen-minus
character, and IE (and Opera) applies this principle. The problem we are
discussing is that such breaks are often undesirable.

The rules don't imply that a program _must_ break a line after a
hyphen-minus character in any particular occasion. But IE (and Opera)
rather mechanically breaks after it.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 23 '05 #19

P: n/a
Lachlan Hunt <sp***********@gmail.com> writes:
Using <nobr> is tag soup, so that's just being hypocritical.


Your favourite in-itself-hypocritical spec-of-the-week-club aside, it's
no more tag soup than PI substitutes like BR or HR or mystery-meat
attributes like WIDTH and HEIGHT.
--
| ) Più Cabernet,
-( meno Internet.
| ) http://bednarz.nl/
Jul 23 '05 #20

P: n/a
Jukka K. Korpela (jk******@cs.tut.fi) wrote:
: Lachlan Hunt <sp***********@gmail.com> wrote:

: >> It's surely _more_ semantic than the W3C approach which moves us
: >> down to the character level.
: >
: > It depends. Some situations may be more appropriately marked up
: > using elements, and others may be better left at the character level.

: Moving it to character level means that presentational features have been
: wired in into the document's textual content. Isn't this worse than
: wiring it in into markup around the content?

Thank you for those words. That was what I was trying to get at.

No matter what the desirable semantics of html might be, it still seems
backwards to me that the "low level logic" of the individual symbols used
in the document could have more control over the presentation than the
"high level logic" of the markup language used by a tool that is very much
concerned with the presentation details of the document.
In another part of this thread I said
Surely they don't have any applicability in any text except as the
application chooses them to have applicability.

and Alan J. Flavell responded
That looks like a tautology to me!


What I meant was that the unicode standard, in my opinion, should not
define anything but the mapping of character values to the characters name
and an acceptable glyph for it. Everything else should be handled via
higher level logic. The exact details appropriate to that higher logic
would depend on the technologies being used, not on unicode. At least
that is my opinion after seeing how complex the whole issue of unicode
seems to have become, compared to the simple simple simple original idea
of solving character set problems by defining a new standard character set
that simply defined far more characters than the old standard ascii
character set and enabled this by simply requiring computers and software
to use more bits per character.

Well ok, for various reasons the characters appear to need to be able to
indicate certain things such as line breaks, but even that level of
formatting information in the character set should be de-supported, except
to define sets of reserved values available to applications to use as they
see fit (and obviously some of those values would end up having "well
known semantics").

Somehow, the original discussiom seemed to touch on those ideas, that's
all.

Jul 23 '05 #21

P: n/a
Eric B. Bednarz wrote:
Lachlan Hunt <sp***********@gmail.com> writes:
Using <nobr> is tag soup, so that's just being hypocritical.
Your favourite in-itself-hypocritical spec-of-the-week-club aside,


I have no idea which "club" you're talking about. I try to avoid being
hypocritical, and if I have, could you please explain so I can correct
myself?
it's no more tag soup than PI substitutes like BR or HR or mystery-meat
attributes like WIDTH and HEIGHT.


Unlike <nobr>, those elements and attributes do actually exist in the
HTML specificaitions. Although, if your point is that they are also
presentational, then I would somewhat agree.

It is true that in some cases, those elements and attributes can be
considered presentational, especially given the poorly structured design
of the <br> and <hr> elements, which I'm sure has been discussed many
times before. However, if they are used correctly, they can be
reasonably semantic and their use is certainly not as bad as <nobr>.

--
Lachlan Hunt
http://lachy.id.au/
http://GetFirefox.com/ Rediscover the Web
http://SpreadFirefox.com/ Igniting the Web
Jul 23 '05 #22

P: n/a
On Mon, 14 Nov 2004, Malcolm Dew-Jones wrote:
What I meant was that the unicode standard, in my opinion, should
not define anything but the mapping of character values to the
characters name and an acceptable glyph for it.
I think that's what the iso-10646 part aims to do. You'll recall that
there were originally two separate pushes trying to address the i18n
problem: iso-10646, and Unicode, and that they sort-of spliced
themselves together. This would have been in the early 1990's,
roughly, IIRC. But the joins still show in a number of places.

The Unicode specification goes quite some way beyond merely assigning
code points to characters. It not only codifies characters (in terms
of case mapping, directionality, combining properties etc.) but also
defines a number of characters meant to exercise control functions
without corresponding to any displayable glyph (zero-width joiner and
non-joiner, directionality-control etc.). These are defined primarily
for their use in plain-text data; their applicability in particular
applications such as a markup language is less obvious, and needs to
be codified by the markup language (as indeed occurs to some extent in
HTML).

You might be of the opinion that that was inadvisable - was being done
at the wrong protocol level etc. - and for sure you'd have quite a few
arguments in your favour, but I'm afraid we have to take Unicode as it
is now, whatever we might think about such details.
Everything else should be handled via higher level logic. The exact
details appropriate to that higher logic would depend on the
technologies being used, not on unicode.


Same answer, I guess. HTML -could- have said that characters like
zero-width joiner, pop directional format, etc. had no business being
in an HTML source document, and that such matters had to be resolved
at the markup or presentation level; but HTML didn't say that - quite
the contrary, in fact. For better or for worse.
Jul 23 '05 #23

P: n/a
Lachlan Hunt <sp***********@gmail.com> wrote:
Unlike <nobr>, those elements and attributes do actually exist in the
HTML specificaitions. Although, if your point is that they are also
presentational, then I would somewhat agree.

It is true that in some cases, those elements and attributes can be
considered presentational, especially given the poorly structured design
of the <br> and <hr> elements, which I'm sure has been discussed many
times before. However, if they are used correctly, they can be
reasonably semantic and their use is certainly not as bad as <nobr>.


UAs don't give a flying monkey if the markup is valid, proper use of
<nobr> causes no problems, it prevents several UAs from applying
ludicrous unicode breaking rules, and a custom DTD solves the errors on
validation.

So what is you argument on why it is "bad"?

--
Spartanicus
Jul 23 '05 #24

P: n/a
Lachlan Hunt <sp***********@gmail.com> writes:
Eric B. Bednarz wrote:
Your favourite in-itself-hypocritical spec-of-the-week-club aside,


I have no idea which "club" you're talking about.


I was forward-guessing that the real upshot about 'tag soup' was the
mere absence of NOBR in W3C specs.
it's no more tag soup than PI substitutes like BR or HR or mystery-meat
attributes like WIDTH and HEIGHT.


Unlike <nobr>, those elements and attributes do actually exist in the
HTML specificaitions.


Well, who cares a rat's private parts.
It is true that in some cases, those elements and attributes can be
considered presentational, especially given the poorly structured
design of the <br> and <hr> elements,
Well, the vocabulary of HTML being so blunt a tool is the reason that
sometimes presentational markup is better than nothing (e.g. <I> for
anything that is denoted with italics in conventional typography but
falls short of a corresponding element type in HTML is still richer than
SPAN -- or nothing). You can argue about the virtue of such issues
until the cows come home.

BR, however, is not about *descriptive markup* (in SGML: tags) at all.
It tells the application to *do* something (e.g. explode, play some
music, render a new line; in SGML: processing instructions -- though one
could probably also argue that a character reference should do the
trick: the parser collapses the HTML whitespace chars and resolved
character references for CR/LF are passed to the application for literal
rendering. This -- like anything SGML related in HTML -- doesn't have
anything to do with web browsers, or real life in general, of course.
However, if they are used correctly, they can be reasonably semantic
and their use is certainly not as bad as <nobr>.


I still do not see what is bad about NOBR (or WBR, for that matter). In
the worst case scenario nothing happens.

Let's look at a slightly modified version of your earlier statement, and
pretend the double hyphen/minus was an em dash.

| Those elements and attributes--unlike NOBR--do actually exist in the
| HTML specifications.

If you oberve UA behaviour and Unicode line breaking rules, you'll
realise that you need some presentational markup to the rescue:

| Those elements and attributes<wbr>--<wbr><nobr>unlike
| NOBR</nobr><wbr>--<wbr>do actually exist in the
| HTML specifications.

Neat, no? :)
--
| ) Più Cabernet,
-( meno Internet.
| ) http://bednarz.nl/
Jul 23 '05 #25

P: n/a
In <41******@news.victoria.tc.ca>, on 11/14/2004
at 11:10 PM, yf***@vtn1.victoria.tc.ca (Malcolm Dew-Jones) said:
No matter what the desirable semantics of html might be, it still
seems backwards to me that the "low level logic" of the individual
symbols used in the document could have more control over the
presentation than the "high level logic" of the markup language used
by a tool that is very much concerned with the presentation details
of the document.
Actually, it is backwards for the HTML to be very much concerned with
the presentation details of the document. That's not what HTML was
intended for.
What I meant was that the unicode standard, in my opinion, should not
define anything but the mapping of character values to the characters
name and an acceptable glyph for it.
I strongly disagree. That might be acceptable for character data in
your HTML, but it breaks entry of data into forms.
At least that is my opinion after seeing how complex the whole issue
of unicode seems to have become,


It isn't just Unicode, and it didn't "become" complex; it was always
complex. You're looking at it from the perspective of an Indo-European
language, and aren't seein all of the issues.

--
Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action. I reserve the
right to publicly post or ridicule any abusive E-mail. Reply to
domain Patriot dot net user shmuel+news to contact me. Do not
reply to sp******@library.lspace.org

Jul 23 '05 #26

P: n/a
Shmuel (Seymour J.) Metz (sp******@library.lspace.org.invalid) wrote:
: In <41******@news.victoria.tc.ca>, on 11/14/2004
: at 11:10 PM, yf***@vtn1.victoria.tc.ca (Malcolm Dew-Jones) said:

: >No matter what the desirable semantics of html might be, it still
: >seems backwards to me that the "low level logic" of the individual
: >symbols used in the document could have more control over the
: >presentation than the "high level logic" of the markup language used
: >by a tool that is very much concerned with the presentation details
: >of the document.

: Actually, it is backwards for the HTML to be very much concerned with
: the presentation details of the document. That's not what HTML was
: intended for.

I said that HTML was used by a _tool_ that is concerned with presentation.

Such tools makes extensive presentation decisions based on html, and the
ability of the tools to make correct presentation decisions in a variety
of environments has always been a prime purpose for html.
: >What I meant was that the unicode standard, in my opinion, should not
: >define anything but the mapping of character values to the characters
: >name and an acceptable glyph for it.

: I strongly disagree. That might be acceptable for character data in
: your HTML, but it breaks entry of data into forms.

How does it break the entry of data into forms?

: >At least that is my opinion after seeing how complex the whole issue
: >of unicode seems to have become,

: It isn't just Unicode, and it didn't "become" complex; it was always
: complex. You're looking at it from the perspective of an Indo-European
: language, and aren't seein all of the issues.

That's right, I am. Various language issues should not be dealt with at
the level of character data exactly because some human language issues do
not easily map to character data. Trying to do so just makes everything
unnecessarily complicated for the languages that do map reasonably well to
such a simple system.

Those other issues should be dealt with by a higher level protocol.

If unicode had been kept simple then we all would have been using it for
all western-style languages many years ago, and all the effort currently
being spent would instead be used working on systems that work more
naturally for non-western-languages.

However, as has been pointed out, the decisions have been made and we have
to live with them.

Jul 23 '05 #27

P: n/a
In <41******@news.victoria.tc.ca>, on 11/16/2004
at 10:40 PM, yf***@vtn1.victoria.tc.ca (Malcolm Dew-Jones) said:
I said that HTML was used by a _tool_ that is concerned with
presentation.
And water is wet. Your point?
How does it break the entry of data into forms?
Because it fails to present the data as the user expects when the
user enters Unicode data that are intended to have an effect on
presentation.
That's right, I am. Various language issues should not be dealt
with at the level of character data exactly because some human
language issues do not easily map to character data.
The issues dealt with by Unicode do map easily.
Those other issues should be dealt with by a higher level protocol.
No. What you want would destroy interoperability between applications.
If unicode had been kept simple then we all would have been using it
for all western-style languages many years ago,


Why? And why would its adoption matter matter if it didn't include
true internationalization?

--
Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action. I reserve the
right to publicly post or ridicule any abusive E-mail. Reply to
domain Patriot dot net user shmuel+news to contact me. Do not
reply to sp******@library.lspace.org

Jul 23 '05 #28

This discussion thread is closed

Replies have been disabled for this discussion.