473,320 Members | 2,071 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Soft-hyphens or breakable points in a string

Hi,

My page has a table with many columns such that the right-side of the
table gets chopped off when printed. I specify a table width of 100%,
but otherwise no cell dimensions are specified. The culprits are 2 wide
columns which contain e-mail addresses.

I can get the page to fit entirely on the printer output if the browser
would break the e-mail address string at the '@' symbol. What I've done
for now is replaced the '@' in all e-mail addresses with
'[space]@[space]' which now wraps nicely and my table fits. However,
this is somewhat undesirable because for e-mail addresses that are
already short, it shows the address with spaces around the '@' symbol.

Is there an HTML trick I can use that tells the browser that it is
permissible, but only if needed, to break the string at the '@' or dot
(.), much like the soft-hyphen does in Word?

Mark
Sep 12 '05 #1
6 3515
Mark wrote:
My page has a table with many columns such that the right-side of the
table gets chopped off when printed. I specify a table width of 100%,
but otherwise no cell dimensions are specified. The culprits are 2 wide
columns which contain e-mail addresses.
I would primarily consider the possibilities for reducing the amount of
information per row. In the absence of a URL demonstrating the actual
problem, I cannot make any more specific suggestion.

Secondarily, I would consider whether it is possible to reduce the width
requirements of _other_ columns than those containing E-mail addresses.
The reason is that breaking an E-mail address may cause confusion and
even give a wrong idea of what the address is.
I can get the page to fit entirely on the printer output if the browser
would break the e-mail address string at the '@' symbol.
The Unicode line breaking rules define "@" as belonging to line breaking
class AL, i.e. as comparable to alphabetic characters. Although those
rules are generally highly debatable, there is wisdom behind this
particular assignment. The at sign is typically used in contexts like
E-mail addresses, URLs, and programming language constructs where a line
break between "@" and an adjacent letter would not be appropriate. An
E-mail address is basically an unbreakable string that must not contain
whitespace (except in a comment).

Thus, I would avoid breaking an E-mail address at almost any cost.
What I've done
for now is replaced the '@' in all e-mail addresses with
'[space]@[space]' which now wraps nicely and my table fits.
That's even worse, since it introduces whitespace on both side of "@". A
naive user might even think that the space is part of the address.
(After all, few people in the world know the _exact_ syntax of E-mail
address, i.e. are variations and complications and special cases that
are allowed.)
Is there an HTML trick I can use that tells the browser that it is
permissible, but only if needed, to break the string at the '@' or dot
(.), much like the soft-hyphen does in Word?


There is the <wbr> trick, e.g.
jkorpela@<wbr>cs.<wbr>tut.<wbr>fi
It's genuinely a trick: it works in most browsing situations but does
not conform to any standard. There's also the standard-conforming way of
using a zero width no-break space, which works very rarely and causes
quite some trouble when it doesn't. See
http://www.cs.tut.fi/~jkorpela/html/nobr.html#suggest

According to the reputable "Chicago Manual of Style" (clause 7.44), if a
URL or E-mail address needs to be broken, the break should appear
"between elements, after a colon, a slash, a double slash, or the symbol
@ but before a period or any other punctuation or symbols". I think
there's a wisdom in not breaking after but before a period: a period at
the end of line will easily be seen as terminating the address, whereas
a period at the start of a line suggests that it is a continuation of
the preceding line.

P.S. The soft hyphen does _not_ work the way you think in MS Word. If
you enter a soft hyphen character, MS Word treats it as yet another
graphic character and displayes it in all occasions. You can use an MS
Word command to add "soft hyphen", but what really happens is that a
normal hyphen-minus "-" is inserted, together with invisible extra
information that forbids a line break after it.
Sep 12 '05 #2
On Mon, 12 Sep 2005, Jukka K. Korpela wrote:
P.S. The soft hyphen does _not_ work the way you think in MS Word. If
you enter a soft hyphen character, MS Word treats it as yet another
graphic character and displayes it in all occasions. You can use an MS
Word command to add "soft hyphen", but what really happens is that a
normal hyphen-minus "-" is inserted, together with invisible extra
information that forbids a line break after it.


Your statement is meaningless when you don't define which character
the "soft hyphen" is supposed to be. In older versions of MS Word for
Macintosh as well as for Windows, character 31 = 0x1F = 037 was used
for the soft hyphen. I don't know if this is still true for Word 2003.

You can check this by inserting char \037 into a text file or
the expression \'1f into an RTF file.

Sep 12 '05 #3
Andreas Prilop <nh******@rrzn-user.uni-hannover.de> wrote:
On Mon, 12 Sep 2005, Jukka K. Korpela wrote:
P.S. The soft hyphen does _not_ work the way you think in MS Word. If
you enter a soft hyphen character, MS Word treats it as yet another
graphic character and displayes it in all occasions. You can use an MS
Word command to add "soft hyphen", but what really happens is that a
normal hyphen-minus "-" is inserted, together with invisible extra
information that forbids a line break after it.
Your statement is meaningless when you don't define which character
the "soft hyphen" is supposed to be.


In the absence of a reference to any other standard or specification,
I think it is fair to postulate the understanding that "soft hyphen" means
the character named that way in ISO 10646, Unicode, and standards in the
ISO 8859 family.
In older versions of MS Word for
Macintosh as well as for Windows, character 31 = 0x1F = 037 was used
for the soft hyphen. I don't know if this is still true for Word 2003.

You can check this by inserting char \037 into a text file or
the expression \'1f into an RTF file.


That might be true - I haven't studied what Word really inserts when you
give the command for inserting an "Optional Hyphen" (that seems to be what
MS Word calls it in the English version). But if I save a document in RTF
format from MS Word, "Optional Hyphen" gets turned into "\-".

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Sep 12 '05 #4
On Mon, 12 Sep 2005, Jukka K. Korpela wrote:
In older versions of MS Word for
Macintosh as well as for Windows, character 31 = 0x1F = 037 was used
for the soft hyphen. I don't know if this is still true for Word 2003.


That might be true - I haven't studied what Word really inserts when you
give the command for inserting an "Optional Hyphen" (that seems to be what
MS Word calls it in the English version).


Here is a text file that contains char 31 = 0x1F several times:
http://www.unics.uni-hannover.de/nht...oft-hyphen.txt
Word 97 recognizes this character as soft hyphen.

Sep 14 '05 #5

On Mon, 12 Sep 2005, Jukka K. Korpela wrote:
Mark wrote:
My page has a table with many columns such that the right-side of the
table gets chopped off when printed. I specify a table width of 100%,
but otherwise no cell dimensions are specified. The culprits are 2 wide
columns which contain e-mail addresses.
I would primarily consider the possibilities for reducing the amount of
information per row. In the absence of a URL demonstrating the actual
problem, I cannot make any more specific suggestion.

Secondarily, I would consider whether it is possible to reduce the width
requirements of _other_ columns than those containing E-mail addresses.
The reason is that breaking an E-mail address may cause confusion and
even give a wrong idea of what the address is.
I can get the page to fit entirely on the printer output if the browser
would break the e-mail address string at the '@' symbol.


The Unicode line breaking rules define "@" as belonging to line breaking
class AL, i.e. as comparable to alphabetic characters. Although those
rules are generally highly debatable, there is wisdom behind this
particular assignment. The at sign is typically used in contexts like
E-mail addresses, URLs, and programming language constructs where a line
break between "@" and an adjacent letter would not be appropriate. An
E-mail address is basically an unbreakable string that must not contain
whitespace (except in a comment).


Check out RFC 822.

[blockquote]

3.1.4. STRUCTURED FIELD BODIES

To aid in the creation and reading of structured fields, the
free insertion of linear-white-space (which permits folding
by inclusion of CRLFs) is allowed between lexical tokens.
Rather than obscuring the syntax specifications for these
structured fields with explicit syntax for this linear-white-
space, the existence of another "lexical" analyzer is assumed.
This analyzer does not apply for unstructured field bodies
that are simply strings of text, as described above. The
analyzer provides an interpretation of the unfolded text
composing the body of the field as a sequence of lexical sym-
bols.

These symbols are:

- individual special characters
- quoted-strings
- domain-literals
- comments
- atoms

The first four of these symbols are self-delimiting. Atoms
are not; they are delimited by the self-delimiting symbols and
by linear-white-space. For the purposes of regenerating
sequences of atoms and quoted-strings, exactly one SPACE is
assumed to exist, and should be used, between them. (Also, in
the "Clarifications" section on "White Space", below, note the
rules about treatment of multiple contiguous LWSP-chars.)

So, for example, the folded body of an address field

":sysmail"@ Some-Group. Some-Org,
Muhammed.(I am the greatest) Ali @(the)Vegas.WBA

is analyzed into the following lexical symbols and types:

:sysmail quoted string
@ special
Some-Group atom
. special
Some-Org atom
, special
Muhammed atom
. special
(I am the greatest) comment
Ali atom
@ atom
(the) comment
Vegas atom
. special
WBA atom

The canonical representations for the data in these addresses
are the following strings:

":sysmail"@Some-Group.Some-Org

and

Mu**********@Vegas.WBA

[/blockquote]

Muhammed.(I am the greatest) Ali @(the)Vegas.WBA
^ ^
| |
That example appears to have two spaces in it that are not within
parentheses.

I have received more than one request for anti-virus help sent to the
"mailto:" address on my CIH virus page at
http://www.chebucto.ns.ca/~af380/CIH.html

HREF="mailto:%20af380@( Norman )chebucto( De Forest ).ns( CIH.html ).ca"

With spaces *outside* the parentheses the address still works fine with
Lynx on my ISP's system but some email software on some systems
(**cough**cough**Microsoft**cough**) fails to strip out the spaces outside
the comments when doing a DNS lookup on the hostname and/or when passing
the address in the MAIL TO: command (violating the "when passing such
structured information to other systems, such as mail protocol services"
clause quoted below) and thus fails to send the message.

The quoted passage above, is immediately followed by:

[blockquote]

Note: For purposes of display, and when passing such struc-
tured information to other systems, such as mail proto-
col services, there must be NO linear-white-space
between <word>s that are separated by period (".") or
at-sign ("@") and exactly one SPACE between all other
<word>s. Also, headers should be in a folded form.

[/blockquote]

The "For purposes of display" would appear to rule out the original
poster's use of space but the RFC fails to say what should happen should
an address be longer than the character width of a display (only that
any line-break must be followed by a whitespace character (space or tab)).

Thus, I would avoid breaking an E-mail address at almost any cost.
What I've done
for now is replaced the '@' in all e-mail addresses with
'[space]@[space]' which now wraps nicely and my table fits.
That's even worse, since it introduces whitespace on both side of "@". A
naive user might even think that the space is part of the address.
(After all, few people in the world know the _exact_ syntax of E-mail
address, i.e. are variations and complications and special cases that
are allowed.)
Is there an HTML trick I can use that tells the browser that it is
permissible, but only if needed, to break the string at the '@' or dot
(.), much like the soft-hyphen does in Word?


There is the <wbr> trick, e.g.
jkorpela@<wbr>cs.<wbr>tut.<wbr>fi
It's genuinely a trick: it works in most browsing situations but does
not conform to any standard. There's also the standard-conforming way of
using a zero width no-break space, which works very rarely and causes
quite some trouble when it doesn't. See
http://www.cs.tut.fi/~jkorpela/html/nobr.html#suggest


Don't you mean the "zero width non-joiner" there (U+200C, *) (as
opposed to the zero width joiner, U+200D, &@8205;)?

I think that the use of the zero width non-joiner should be the preferred
way of doing things and that "works very rarely and causes quite some
trouble when it doesn't" sould be replaced by "however it may be
necessary to get[1] software authors to fix their buggy treatment of
this character which causes quite some trouble when it doesn't work".

If Lynx can handle it properly (as well as the soft hyphen), why can't IE
and Firefox do the same? (I haven't tried it with Opera yet.)

According to the reputable "Chicago Manual of Style" (clause 7.44), if a
URL or E-mail address needs to be broken, the break should appear
"between elements, after a colon, a slash, a double slash, or the symbol
@ but before a period or any other punctuation or symbols". I think
there's a wisdom in not breaking after but before a period: a period at
the end of line will easily be seen as terminating the address, whereas
a period at the start of a line suggests that it is a continuation of
the preceding line.

P.S. The soft hyphen does _not_ work the way you think in MS Word. If
you enter a soft hyphen character, MS Word treats it as yet another
graphic character and displayes it in all occasions. You can use an MS
Word command to add "soft hyphen", but what really happens is that a
normal hyphen-minus "-" is inserted, together with invisible extra
information that forbids a line break after it.


[1] The following change is optional depending on the reader's
preferences (may wrap on your display but enter as one long line):
s/get software authors to/beat software authors about the head and shoulders until they/
--
``Why don't you find a more appropiate newsgroup to post this tripe into?
This is a meeting place for a totally differnt kind of "vision impairment".
Catch my drift?'' -- "jim" in alt.disability.blind.social regarding an
off-topic religious/political post, March 28, 2005

Sep 26 '05 #6
"Norman L. DeForest" <af***@chebucto.ns.ca> wrote:
Check out RFC 822.
Why? It has been obsoleted by the IETF.
[blockquote]
Pointless. In future, please cite (relevant) document by clause instead of
<bulkquote>.
That example appears to have two spaces in it that are not within
parentheses.
So? It's still a bad idea to break an E-mail address. Who could guess that
a line break is to be replaced by a space in some occasions and by nothing
in other occasions.
HREF="mailto:%20af380@( Norman )chebucto( De Forest ).ns( CIH.html
).ca"
Irrespectively of E-mail address syntax, that violates generic URL syntax,
which forbids unencoded spaces.
There's also the standard-conforming way of
using a zero width no-break space, which works very rarely and causes
quite some trouble when it doesn't. See
http://www.cs.tut.fi/~jkorpela/html/nobr.html#suggest


Don't you mean the "zero width non-joiner" there (U+200C, *) (as
opposed to the zero width joiner, U+200D, &@8205;)?


Please trim your quotes or otherwise indicate what you comment on; the word
"there" is particularly vague. I meant zero width space instead of zero
width no-break space, of course; the cited page tells this correctly
(I usually write web pages more carefully than Usenet postings).
U+200C and U+200D don't belong here, and I don't mention them in my posting
or on my page; they are for affecting ligature behavior and similar issues.
I think that the use of the zero width non-joiner should be the
preferred way of doing things
Which things? Cursive joining?
and that "works very rarely and causes
quite some trouble when it doesn't" sould be replaced by "however it
may be necessary to get[1] software authors to fix their buggy
treatment of this character which causes quite some trouble when it
doesn't work".


That's just play with words.
--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Sep 27 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

383
by: John Bailo | last post by:
The war of the OSes was won a long time ago. Unix has always been, and will continue to be, the Server OS in the form of Linux. Microsoft struggled mightily to win that battle -- creating a...
4
by: tbatwork828 | last post by:
Related to my other post on Graphics.FillRectangle and a lot of page faults caused by this call... We determine that when Control.DoubleBuffer=true to avoid the flicker effect,...
0
by: Jim | last post by:
Hello, I'm trying to unzip archives that have soft links in them (that is, were created on unix with a -y option). I'm not finding that ZipFile will do that. That I can see, it creates the...
0
by: paulvz882 | last post by:
comprimes de acomplia en France commander acomplia en ligne aucune prescription acomplia canada soft en ligne +++ PERTE DE POIDS +++ PERTE DE POIDS +++ PERTE DE POIDS +++ + ACHETER DU ACOMPLIA...
0
by: hessman1234 | last post by:
achat lipitor canada bon marche achat lipitor au rabais en ligne sans prescription acheter lipitor le plus bon marche un Achat de achat lipitor us usa avec livraison achat lipitor belgique soft...
0
by: jkjtjktjktjhktjk | last post by:
acheter du acomplia en ligne acomplia belgique suisse achat acomplia discret acomplia a vendre PERTE DE POIDS comprimes de acomplia canada acomplia belgique soft bon marche +++ PERTE DE POIDS...
0
by: jkjtjktjktjhktjk | last post by:
acheter du acomplia cinq acomplia par email PERTE DE POIDS en France acomplia canada soft en ligne acomplia belgique acomplia belgique bon marche acheter du acomplia en ligne sur internet +++...
0
by: gfjhrsghhdgh | last post by:
acomplia belgique soft generique commander comprime de acomplia canada achat acomplia canada Pro commander acomplia us usa soft acomplia suisse le plus bon marche citrate de Rimonabant soft en...
0
by: sdgvfwe | last post by:
achat ampicillin canada a vendre comprimes de acheter ampicillin en France bon marche achat acheter ampicillin canada acheter ampicillin canada en ligne sur internet acheter ampicillin canada soft...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.