473,414 Members | 1,848 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,414 software developers and data experts.

To wrap or not to wrap "C++"?

Can anyone tell me if Opera 9.5 is behaving correctly when wrapping the
word C++, eg:

C+
+

Opera 9.2 didn't wrap C++. For those who use Opera 9.5 there is a test
case at http://www.highscore.de/browsertest/cpp.html (try different window
sizes until Opera 9.5 wraps C++).

Boris
Sep 13 '08 #1
13 1903
Boris schrieb am 13.09.2008 16:23:
Can anyone tell me if Opera 9.5 is behaving correctly when wrapping the
word C++, eg:

C+
+
Opera 9.6 Weekly beta wraps
C
++
and
C+
+

Version 9.60 beta
Build 10427
Plattform Win32
Betriebssystem Windows XP
Opera 9.2 didn't wrap C++. For those who use Opera 9.5 there is a test
case at http://www.highscore.de/browsertest/cpp.html (try different window
sizes until Opera 9.5 wraps C++).
--
Mit freundlichen Grüßen
Holger Jeromin
Sep 13 '08 #2
On 2008-09-13, Boris <bo****@web.dewrote:
Can anyone tell me if Opera 9.5 is behaving correctly when wrapping the
word C++, eg:

C+
+
The specification that defines all this is Unicode Standard Annex #14.

Browsers don't have to follow that specification to claim they support
HTML and/or CSS, but it's the easiest way to support a large number of
world languages.

So technically, Opera is correct pretty much whatever it does, but it
looks like they are basically doing the Unicode rules.

You can read Annex 14 for yourself:

http://unicode.org/reports/tr14/

The long and short of it is that "+" has the "Line breaking class" of PR
or "Prefix" so it's treated a bit like a currency symbol.

Now, I think that means that "by default" you can't break "C+" or "+C",
based on the definition of PR.

Not sure what they mean by "by default". I haven't read the whole spec.

But if you look at the pair table in section 7.3, that says you can
break between "C+" (and "++") but not between "+C", which is just what
Opera is doing. (The class of "C" is "AL"-- "alphabetic" or something).
By 'break between "C+"' I mean break between the "C" and the "+".

Korpela is the expert on this kind of thing.

If you want to prevent wrapping, because after all C++ is a special use
of "+", use white-space: nowrap.
Opera 9.2 didn't wrap C++. For those who use Opera 9.5 there is a test
case at http://www.highscore.de/browsertest/cpp.html (try different
window sizes until Opera 9.5 wraps C++).
It's possible they've been improving their support for world languages
and so sharpened up the Unicode-conformance of their line-breaking
method.
Sep 13 '08 #3
Ben C wrote:
On 2008-09-13, Boris <bo****@web.dewrote:
>Can anyone tell me if Opera 9.5 is behaving correctly when wrapping
the word C++, eg:

C+
+

The specification that defines all this is Unicode Standard Annex #14.
Not really.
Browsers don't have to follow that specification to claim they support
HTML and/or CSS,
Thus, UAX #14 does _not_ define whether the behavior is correct or not.

HTML (or CSS) specifications do not require conformance to the Unicode
Standard. (They define things in terms of it, or rather its partial
equivalent ISO 10646, but that's a different issue.) Moreover, UAX #14,
though part of the standard, is not normative except for a few parts, so
even if Opera claimed Unicode conformance, it could wrap C++ as it likes,
formally speaking.
but it's the easiest way to support a large number of
world languages.
I disagree; see http://www.cs.tut.fi/~jkorpela/unicode/linebr.html for some
arguments. UAX #14 is quite a mess and basically tries to deal with
_general_ principles of line breaking. Yet its rules are often very coarse,
either preventing completely acceptable line breaks or (more often) allowing
foolish line breaks. I would say that it's not very useful except in
exceptional situations where you _must_ break a string somewhere and have no
better guidelines. Unfortunately, web browsers have started implementing
parts of UAX #14 to an increasing amount (though still not very much and
never really consistently). The old principle of treating only whitespace as
allowable break point generally works better, though it naturally fails for
language that don't use whitespace between words - but such problems should
be solved in a different way.
The long and short of it is that "+" has the "Line breaking class" of
PR or "Prefix" so it's treated a bit like a currency symbol.
Yes. But the meaning of PR is described vaguely in UAX #14, and the prose
part contradicts the formal parts.
Now, I think that means that "by default" you can't break "C+" or
"+C", based on the definition of PR.

Not sure what they mean by "by default". I haven't read the whole
spec.
I have read the whole spec, and I'm not sure what they mean "by default".
But if you look at the pair table in section 7.3, that says you can
break between "C+" (and "++") but not between "+C", which is just what
Opera is doing. (The class of "C" is "AL"-- "alphabetic" or
something). By 'break between "C+"' I mean break between the "C" and
the "+".
The formal rules before the pair table imply that a "direct break" is
allowed between AL and PR as well as between PR and PR, so in "C++", a break
is permitted anywhere (between C and + as well as between + and +), and
Opera works that way. "Direct break" means that a break is allowed even if
no space intervenes.

Actually, it seems to me that the pair table, as well as the formal rules,
also permits direct break between PR and AL. They prevent a break between PR
and NU (= number), though, so +1 is unbreakable whereas +A is breakable.
Opera 9.5 does not break +A, but who knows whether some next version will
follow UAX #14 to more madness?
If you want to prevent wrapping, because after all C++ is a special
use of "+", use white-space: nowrap.
That's one possibility and works most of the time, but do you really want to
let such things depend on _styling_? I don't think it is a matter of
optional presentational features whether you speak of "C++" or of "C+ +" or
"C ++".

For reasons explained at
http://www.cs.tut.fi/~jkorpela/html/nobr.html
I think it is preferable to use <nobr>C++</nobr>. When standards are wrong,
don't let them prevent you from doing things the best possible way.

--
Yucca, http://www.cs.tut.fi/~jkorpela/

Sep 13 '08 #4
On 2008-09-13, Jukka K. Korpela <jk******@cs.tut.fiwrote:
Ben C wrote:
>On 2008-09-13, Boris <bo****@web.dewrote:
>>Can anyone tell me if Opera 9.5 is behaving correctly when wrapping
the word C++, eg:

C+
+

The specification that defines all this is Unicode Standard Annex #14.

Not really.
OK, but I don't know of another specification for it, and I suspect it
may be the one Opera are actually using.
>Browsers don't have to follow that specification to claim they support
HTML and/or CSS,

Thus, UAX #14 does _not_ define whether the behavior is correct or not.
Yes.
HTML (or CSS) specifications do not require conformance to the Unicode
Standard. (They define things in terms of it, or rather its partial
equivalent ISO 10646, but that's a different issue.) Moreover, UAX #14,
though part of the standard, is not normative except for a few parts, so
even if Opera claimed Unicode conformance, it could wrap C++ as it likes,
formally speaking.
Yes, I did try to say that.
>but it's the easiest way to support a large number of
world languages.

I disagree; see http://www.cs.tut.fi/~jkorpela/unicode/linebr.html for some
arguments.
You make some good points there. But still, implementing line-breaking
for every language you want to support without a specification is quite
a daunting prospect.

Even Japanese and Chinese aren't that easy-- there are various
bracket and quote characters to watch out for.

[...]
The old principle of treating only whitespace as allowable break point
generally works better, though it naturally fails for language that
don't use whitespace between words - but such problems should be
solved in a different way.
How?
>The long and short of it is that "+" has the "Line breaking class" of
PR or "Prefix" so it's treated a bit like a currency symbol.

Yes. But the meaning of PR is described vaguely in UAX #14, and the prose
part contradicts the formal parts.
>Now, I think that means that "by default" you can't break "C+" or
"+C", based on the definition of PR.

Not sure what they mean by "by default". I haven't read the whole
spec.

I have read the whole spec, and I'm not sure what they mean "by default".
I also note from your document that "PR" is one of the "informative"
classes.
>But if you look at the pair table in section 7.3, that says you can
break between "C+" (and "++") but not between "+C", which is just what
Opera is doing. (The class of "C" is "AL"-- "alphabetic" or
something). By 'break between "C+"' I mean break between the "C" and
the "+".

The formal rules before the pair table imply that a "direct break" is
allowed between AL and PR as well as between PR and PR, so in "C++", a break
is permitted anywhere (between C and + as well as between + and +), and
Opera works that way. "Direct break" means that a break is allowed even if
no space intervenes.
I think that's the same thing the table is saying.
Actually, it seems to me that the pair table, as well as the formal rules,
also permits direct break between PR and AL.
There's definitely a "PR x AL" in LB24. Perhaps you're looking at
LB25...
They prevent a break between PR
and NU (= number), though, so +1 is unbreakable whereas +A is breakable.
Opera 9.5 does not break +A, but who knows whether some next version will
follow UAX #14 to more madness?
I still maintain +A is unbreakable according to LB24.

I wouldn't be surprised if Opera followed UAX #14 pretty strictly
already. They don't strike me as the types to do things by halves.
>If you want to prevent wrapping, because after all C++ is a special
use of "+", use white-space: nowrap.

That's one possibility and works most of the time, but do you really want to
let such things depend on _styling_? I don't think it is a matter of
optional presentational features whether you speak of "C++" or of "C+ +" or
"C ++".
I don't have a strong view on that.
Sep 13 '08 #5
Ben C wrote:
But still, implementing line-breaking
for every language you want to support without a specification is
quite a daunting prospect.
UAX #14 does _not_ define line breaking rules for all languages, or for
_any_ language. It specifies some _general_ rules, which largely revolve
around special characters.

If you wanted to have line breaking by the rules of English, Finnish, and
Russian, for example, your main concern should be hyphenation (which is
rather different in nature in those languages). You would need to deal with
some special issues with special characters (e.g. apostrophe and colon)
which may appear in words. The rest would be basically wrapping at
whitespace. Anything you add to that is probably external to all of those
languages. If the text mentions "C++" or the C++ expression "i++", it's to
be handled differently from rules for English, Finnish, and Russian.
Generally, you should treat it as indivisible. And if you need to line wrap
C++ code, for example, special rules are needed, rules specific to the C++
"language".
>The old principle of treating only whitespace as allowable break
point generally works better, though it naturally fails for language
that
don't use whitespace between words - but such problems should be
solved in a different way.

How?
I'm sure experts on different languages can present good answers to such
questions. After all, languages like Chinese were written and printed long
before Unicode was invented. Part of the rules might be formulated as rules
for line breaking behavior of characters, but they would not take us very
far. General character-level rules would work when some characters are only
used in specific languages that e.g. always allow a break after those
characters. But the rules can be more complicated and much above the
character level.
>Actually, it seems to me that the pair table, as well as the formal
rules, also permits direct break between PR and AL.

There's definitely a "PR x AL" in LB24. Perhaps you're looking at
LB25...
You're right. I somehow managed to lose track when looking at the pair table
_and_ to miss LB24 when quickly searching for all occurrences of "PR" in the
rules.
I still maintain +A is unbreakable according to LB24.
Right.
I wouldn't be surprised if Opera followed UAX #14 pretty strictly
already. They don't strike me as the types to do things by halves.
I'm afraid you might be right. Opera seems to have started to fail to wrap
in a context like
"foo" (bar)
since by UAX #14, a break is not allowed between the ASCII apostrophe and an
opening parenthesis. Opera also follows UAX #14 and the example set by IE in
wrapping between a letter and an opening parenthesis even when no space
intervenes, as in
foo(bar)

Excuse me while I jump on the walls and talk incomprehensibly to myself.

Far from dealing with line wrapping automatically, this effectively forces
authors to use the "nonstandard" tags <nobrand <wbrliberally whenever
they use anything but letters, digits, and basic punctuation in texts.
>I don't think it is a
matter of optional presentational features whether you speak of
"C++" or of "C+ +" or "C ++".

I don't have a strong view on that.
I do, because whitespace is significant there; "C++" is a single name,
whereas "C ++" is another name (of another language) followed by a space and
an operator.

--
Yucca, http://www.cs.tut.fi/~jkorpela/

Sep 13 '08 #6
On 2008-09-13, Jukka K. Korpela <jk******@cs.tut.fiwrote:
Ben C wrote:
>But still, implementing line-breaking
for every language you want to support without a specification is
quite a daunting prospect.

UAX #14 does _not_ define line breaking rules for all languages, or for
_any_ language. It specifies some _general_ rules, which largely revolve
around special characters.
But LineBreak.txt gives you a breaking class for most of the characters.
That is effectively the language-specific information.

It's probably script-specific rather than language-specific. But it gets
quite tricky: Korean is sometimes broken at spaces even though it uses
basically ideographs.
If you wanted to have line breaking by the rules of English, Finnish, and
Russian, for example, your main concern should be hyphenation (which is
rather different in nature in those languages).
I didn't know that. But browsers don't do hyphenation anyway.

[...]
>>The old principle of treating only whitespace as allowable break
point generally works better, though it naturally fails for language
that don't use whitespace between words - but such problems should
be solved in a different way.

How?

I'm sure experts on different languages can present good answers to such
questions.
My original point was that if you just implement UAX #14 you don't need
any experts on all the different languages. I take your word for it that
the results might not be as good.
After all, languages like Chinese were written and printed long before
Unicode was invented. Part of the rules might be formulated as rules
for line breaking behavior of characters, but they would not take us
very far. General character-level rules would work when some
characters are only used in specific languages that e.g. always allow
a break after those characters. But the rules can be more complicated
and much above the character level.
But this does make life awfully difficult for people trying to make
browsers (and word processors, etc.).

[...]
>I wouldn't be surprised if Opera followed UAX #14 pretty strictly
already. They don't strike me as the types to do things by halves.

I'm afraid you might be right. Opera seems to have started to fail to
wrap in a context like "foo" (bar) since by UAX #14, a break is not
allowed between the ASCII apostrophe
Is '"' an ASCII apostrophe? Even so, Opera does refuse to break "foo"
(bar). And " has the same breaking class as '.
and an opening parenthesis. Opera also follows UAX #14 and the example
set by IE in wrapping between a letter and an opening parenthesis even
when no space intervenes, as in foo(bar)

Excuse me while I jump on the walls and talk incomprehensibly to myself.
Knock yourself out. Refusing to break "foo" (bar) but breaking foo(bar)
is egregious.
Sep 13 '08 #7
In comp.infosystems.www.authoring.html message <op.uhfblsh59dsao3@burk>,
Sat, 13 Sep 2008 16:23:42, Boris <bo****@web.deposted:
>Can anyone tell me if Opera 9.5 is behaving correctly when wrapping the
word C++, eg:
With sensible rules, a three-character "word" would never be broken.
And I don't much like the idea of breaking to give a two-character
fragment, unless it is positively determined that the word breaks well
there.

--
(c) John Stockton, nr London UK. ??*@merlyn.demon.co.uk Turnpike v6.05 MIME.
Web <URL:http://www.merlyn.demon.co.uk/- FAQish topics, acronyms, & links.
Check boilerplate spelling -- error is a public sign of incompetence.
Never fully trust an article from a poster who gives no full real name.
Sep 13 '08 #8
Op 14-09-08 00:38 heeft Ben C als volgt van zich laten horen:
Is '"' an ASCII apostrophe? Even so, Opera does refuse to break "foo"
(bar). And " has the same breaking class as '.
But Thunderbird seems to have no problem with it :-) Or was that slrn?

H.
--
Hendrik Maryns
http://tcl.sfs.uni-tuebingen.de/~hendrik/
==================
www.lieverleven.be
http://catb.org/~esr/faqs/smart-questions.html
Sep 14 '08 #9
On 2008-09-14, Hendrik Maryns <ia*******@sneakemail.comwrote:
Op 14-09-08 00:38 heeft Ben C als volgt van zich laten horen:
>Is '"' an ASCII apostrophe? Even so, Opera does refuse to break "foo"
(bar). And " has the same breaking class as '.

But Thunderbird seems to have no problem with it :-) Or was that slrn?
Vim, which just breaks at spaces.
Sep 14 '08 #10
On Sat, 13 Sep 2008 22:42:30 +0200, Jukka K. Korpela <jk******@cs.tut.fi>
wrote:
[...]For reasons explained at
http://www.cs.tut.fi/~jkorpela/html/nobr.html
I think it is preferable to use <nobr>C++</nobr>. When standards are
wrong, don't let them prevent you from doing things the best possible
way.
I had asked in the newsgroup opera.page-display before where someone
recommended to use &#x2060; (see
http://groups.google.com/group/opera...0f9dfc99a1642).
I haven't checked yet though what other browsers are going to do when they
see something like C&#x2060;+&#x2060;+ - not sure if this opens another
can of worms?

Boris
Sep 14 '08 #11
Ben C wrote:
But LineBreak.txt gives you a breaking class for most of the
characters. That is effectively the language-specific information.
It's by definition language-independent: it assigns properties to
characters, no matter which (if any) language they are used in. Admittedly,
_some_ characters are used in one language only. But that's coincidential
and may change without notice.
It's probably script-specific rather than language-specific.
Not really. It's character-specific. The Unicode Standard assigns a script
property to each character, but many characters are used across scripts.
But browsers don't do hyphenation anyway.
That's a big part of the problem. When you don't hyphenate, you often get
horrible layout for texts containing long words. Little does it help to
break poor little "C++" then, and it's just incorrect.
My original point was that if you just implement UAX #14 you don't
need any experts on all the different languages.
I'm afraid that's a common misconception, and UAX #14 doesn't try very hard
to prevent it.
>[...] But the rules can be more complicated
and much above the character level.

But this does make life awfully difficult for people trying to make
browsers (and word processors, etc.).
Actually, word processors often handle with it decently, for the languages
they support. Web browsers are much more primitive in handling texts, but
there is no reason why they could not have language-dependent line breaking.
>I'm afraid you might be right. Opera seems to have started to fail to
wrap in a context like "foo" (bar) since by UAX #14, a break is not
allowed between the ASCII apostrophe

Is '"' an ASCII apostrophe?
Sorry, my mistake; it's the ASCII quotation mark, treated as "neutral"
quotation mark in UAX #14. But as you say, the ASCII apostrophe (') has a
similar issue. So do the "left" and "right" quotation marks, U+201C and
U+201D, i.e. the normal English quotes, since they are in fact used in
different ways in different languages.

The original idea in HTML was that any space was a permitted line break
point and no other line breaks would appear, except possibly after a hyphen,
and otherwise no line breaks are generated in formatting (except of course
when explicitly specified in markup). This is coarse and doesn't work at all
for many languages. But at least it does not arbitrarily break strings and
it does not arbitrarily prevent line breaks e.g. between a quoted string and
a parenthetic string when a space intervenes.

--
Yucca, http://www.cs.tut.fi/~jkorpela/

Sep 14 '08 #12
On 2008-09-14, Jukka K. Korpela <jk******@cs.tut.fiwrote:
Ben C wrote:
>But LineBreak.txt gives you a breaking class for most of the
characters. That is effectively the language-specific information.

It's by definition language-independent: it assigns properties to
characters, no matter which (if any) language they are used in. Admittedly,
_some_ characters are used in one language only. But that's coincidential
and may change without notice.
>It's probably script-specific rather than language-specific.

Not really. It's character-specific. The Unicode Standard assigns a script
property to each character, but many characters are used across scripts.
Where I suppose line-breaking conventions may be different (and also
across languages).

[...]
>My original point was that if you just implement UAX #14 you don't
need any experts on all the different languages.

I'm afraid that's a common misconception, and UAX #14 doesn't try very
hard to prevent it.
I am now starting to feel a bit let-down by UAX #14. Life is not so
simple after all.
Sep 14 '08 #13
On 2008-09-14, Boris <bo****@web.dewrote:
On Sat, 13 Sep 2008 22:42:30 +0200, Jukka K. Korpela <jk******@cs.tut.fi>
wrote:
>[...]For reasons explained at
http://www.cs.tut.fi/~jkorpela/html/nobr.html
I think it is preferable to use <nobr>C++</nobr>. When standards are
wrong, don't let them prevent you from doing things the best possible
way.

I had asked in the newsgroup opera.page-display before where someone
recommended to use &#x2060; (see
http://groups.google.com/group/opera...0f9dfc99a1642).
I haven't checked yet though what other browsers are going to do when they
see something like C&#x2060;+&#x2060;+ - not sure if this opens another
can of worms?
It might work in Opera, but not necessarily in other browsers which
aren't so sold on the whole Unicode thing. It is mentioned in Korpela's
nobr page above.

Your choices are basically:

1. that
2. nobr
3. white-space: nowrap

None are worm-free. Choose your compromise.
Sep 14 '08 #14

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: lawrence | last post by:
When users enter urls or other long strings it can destroy the formatting of a page. A long url, posted in a comment, can cause page distortions that make the page unreadable, till the website...
0
by: BW | last post by:
Please ignore the rest of the code, except for the highlighted part (or the line 'ent1=Entry(topf, width=25)' to line 'ent1.insert(INSERT, wrong, if you cannot see the color). You can copy this into...
5
by: Andrzej Adam Filip | last post by:
Could you post some recommendation/advices which options should be used when using tidy to beautify xhtml ? It seems that "wrapped" xhtml produced by standards settings is not "liked" by some...
17
by: black tractor | last post by:
HI there.. l was just wondering, if l place a "table" in the "editable region" of my template, will the text, graphics placed inside the this "table" MOVE BY ITSELF?? l mean, recently l had a...
235
by: napi | last post by:
I think you would agree with me that a C compiler that directly produces Java Byte Code to be run on any JVM is something that is missing to software programmers so far. With such a tool one could...
8
by: Mark | last post by:
My PHP script builds a table that is too wide to fit on the paper. Two of the columns contain strings that are more lengthy than data in the other columns. I can get the table to fit by letting...
0
by: Diego | last post by:
Does the Wrap="False" feature work for a datagrid with autocolums or it's just a joke? Regards, Diego.
4
by: Gérard Talbot | last post by:
Hello fellow stylers, What would be the best CSS equivalent of MSIE's wrap="off" and wrap="hard"? hard Text is displayed with wordwrapping and submitted with soft returns and line feeds. ...
4
by: asnowfall | last post by:
If I have white space in the <atag, IE interpretes it as line break. I tried setting "whie-space: pre" and it did not seem to affect. Here is a sample. ...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.