By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,263 Members | 1,694 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,263 IT Pros & Developers. It's quick & easy.

That long line, – or — or ...

P: n/a
I don't know the English word, but I'm referring to the double-dash
which is used to separate parts of a sentence. I'm using — so far.
Now I saw – which is slightly shorter. Some sites use --.

Is there anything I should know to make a good decision on which to use,
other than what looks best? I think the W3C validator is always handing
out errors, even when I go through the different charsets.

I remember a time when the W3C validator would validate my sites even
though it warned me about the charset... these days, it refuses to do
anything. I don't really control the HTTP charset send in the header.
And I never use meta definitions...
Jul 20 '05 #1
Share this Question
Share on Google+
19 Replies


P: n/a
Philipp Lenssen <ph*************@bb-k.com> wrote:
I don't know the English word, but I'm referring to the double-dash
which is used to separate parts of a sentence.
There are two dashes in English typography: em-dash and en-dash.
I'm using — so far.
— is undefined.
Now I saw – which is slightly shorter.
– is an en-dash, — is an em-dash.
Some sites use --.
A surrogate from the typewriter age.
Is there anything I should know to make a good decision on which to use,
other than what looks best?
http://www.cs.tut.fi/~jkorpela/www/windows-chars.html
http://ppewww.ph.gla.ac.uk/~flavell/...cklist.html#s3
I don't really control the HTTP charset send in the header.
And I never use meta definitions...


See http://www.w3.org/International/O-HTTP-charset
http://ppewww.ph.gla.ac.uk/~flavell/...t/ns-burp.html
how to set the encoding ("charset") of your pages.

--
But thats what FP puts in to the page, so i asume thats correct
Harry H. Arends in microsoft.public.frontpage.client
Jul 20 '05 #2

P: n/a
Philipp Lenssen wrote:
I don't know the English word, but I'm referring to the double-dash
which is used to separate parts of a sentence. I'm using — so far.
That is a control character, specifically, END OF GUARDED AREA. It's not a
dash.

Now I saw – which is slightly shorter.
That is EN DASH, which seems much more appropriate.

Some sites use --.
Two HYPHEN-MINUS characters. Unicode says it's "used for either hyphen or
minus sign."

It's an ambiguous character, but as it's part of ASCII, more compatible in
some circumstances. It's hard to imagine a situation in which that would
break (aural browsers?), although it's definitely a non-optimal solution.

Unicode suggests the following alternatives:

HYPHEN (U+2010)
NON-BREAKING HYPHEN (U+2011)
FIGURE DASH (U+2012)
EN DASH (U+2013)
MINUS SIGN (U+2212)

I'm not sure, but I think you are describing the EN DASH character, so you
could use – in an HTML document. EM DASH (U+2014) is another
possibility ("May be used in pairs to offset parenthetical text"), which
you could use in an HTML document as — (or any of the other ways of
specifying characters in HTML).

Is there anything I should know to make a good decision on which to use,
other than what looks best?
<URL:http://ppewww.ph.gla.ac.uk/~flavell/charset/> includes lots of good
information on character sets, some of which will be useful.

Obviously, the Unicode charts are of some help as well:

<URL:http://www.unicode.org/charts/>

I think the W3C validator is always handing out errors, even when I go
through the different charsets.
Without knowing those errors, nobody can really comment on that.

I remember a time when the W3C validator would validate my sites even
though it warned me about the charset... these days, it refuses to do
anything. I don't really control the HTTP charset send in the header.


As Alan says, "you don't have the tools necessary to do your job as a web
author".
--
Jim Dabell

Jul 20 '05 #3

P: n/a
In article <MP************************@News.Individual.DE> in
comp.infosystems.www.authoring.html, Philipp Lenssen
<ph*************@bb-k.com> wrote:
I don't know the English word, but I'm referring to the double-dash
which is used to separate parts of a sentence. I'm using — so far.
Now I saw – which is slightly shorter. Some sites use --.


It's called an em dash, because it's supposed to be one em wide.
(The en dash is called that because ... well, you can guess. One en
is half of one em, though this is false for widths of dashes in some
fonts.)

The best way to put an em dash in your documents is – -- use
‒ for an en dash. (There are also named character entities,
but browser support is not quite as good.)

— and – for em and en dash are just wrong. Any numeric
character references from 128 through 159 are just wrong. They mean
different things on different machines, sometimes even in different
fonts on the same machine. In many fonts on Microsoft Windows
machines, they mean em and en dashes, and that is why a lot of
people use them. But a large minority of visitors to those Web pages
see either nothing or garbage characters in place of the dashes.

--
Stan Brown, Oak Road Systems, Cortland County, New York, USA
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2 spec: http://www.w3.org/TR/REC-CSS2/
validator: http://jigsaw.w3.org/css-validator/
Jul 20 '05 #4

P: n/a
In article <MP************************@News.Individual.DE> in
comp.infosystems.www.authoring.html, Philipp Lenssen
<ph*************@bb-k.com> wrote:
I remember a time when the W3C validator would validate my sites even
though it warned me about the charset... these days, it refuses to do
anything. I don't really control the HTTP charset send in the header.
And I never use meta definitions...
So what you're saying is that your server doesn't emit a charset,
and you refuse(*) to use the only available backup method; yet you
expect the validator to just guess. Why would you expect it to guess
right? More to the point, why would you expect your visitors'
browsers to guess right?

(*) Maybe this is just a matter of language, and you don't actually
mean you REFUSE to use them. If you mean you have not YET used meta
definitions, it's easy enough to add, between <head> and </head>: <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

If you can't get your server administrator to fix the headers sent
by the server, give thanks that there's a decent workaround in the
shape of a META header.

--
Stan Brown, Oak Road Systems, Cortland County, New York, USA
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2 spec: http://www.w3.org/TR/REC-CSS2/
validator: http://jigsaw.w3.org/css-validator/
Jul 20 '05 #5

P: n/a
In article <MP************************@News.Individual.DE> in
comp.infosystems.www.authoring.html, Philipp Lenssen
<ph*************@bb-k.com> wrote:
I don't know the English word, but I'm referring to the double-dash
which is used to separate parts of a sentence. I'm using — so far.
Now I saw – which is slightly shorter. Some sites use --.


In my previous follow-up (which I've cancelled, for all the good
that does), I stupidly listed – and ‒ for the dashes. It
should be
em dash: —
en dash: –
The reason you see – as "slightly shorter" than — is that
they're different dashes. — is a Microsoftism for the em dash,
which should be — -- the Microsoftism for the – en dash
is –.

As I said in the earlier article, _never_ use € through Ÿ
in your pages. A significant minority of your visitors will not see
those characters as you intended.

--
Stan Brown, Oak Road Systems, Cortland County, New York, USA
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2 spec: http://www.w3.org/TR/REC-CSS2/
validator: http://jigsaw.w3.org/css-validator/
Jul 20 '05 #6

P: n/a
In article <Xn*****************************@193.229.0.31> in
comp.infosystems.www.authoring.html, Jukka K. Korpela
<jk******@cs.tut.fi> wrote:
Most (though not all) authors who say that they don't control HTTP headers
actually have the power to control over the basic headers for their pages.


Is that true with Microsoft IIS servers such as the one that holds
<http://www.acad.sunytccc.edu/instruct/sbrown/calc26/default.htm>? I
spent a _very_ unproductive couple of hours at Microsoft's site
trying to figure out how to specify the charset.

--
Stan Brown, Oak Road Systems, Cortland County, New York, USA
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2 spec: http://www.w3.org/TR/REC-CSS2/
validator: http://jigsaw.w3.org/css-validator/
Jul 20 '05 #7

P: n/a
Stan Brown wrote:
In article <MP************************@News.Individual.DE> in
comp.infosystems.www.authoring.html, Philipp Lenssen
<ph*************@bb-k.com> wrote:
I remember a time when the W3C validator would validate my sites even
though it warned me about the charset... these days, it refuses to do
anything. I don't really control the HTTP charset send in the header.
And I never use meta definitions...


So what you're saying is that your server doesn't emit a charset,
and you refuse(*) to use the only available backup method; yet you
expect the validator to just guess. Why would you expect it to guess
right? More to the point, why would you expect your visitors'
browsers to guess right?


One thing I forgot to mention:

"In addition, web pages should explicitly set a character set to an
appropriate value in all dynamically generated pages."

-- <URL:http://www.cert.org/advisories/CA-2000-02.html>

--
Jim Dabell

Jul 20 '05 #8

P: n/a
In article <13*************************@rrzn-user.uni-hannover.de>
in comp.infosystems.www.authoring.html, Andreas Prilop
<nh******@rrzn-user.uni-hannover.de> wrote:
Stan Brown <th************@fastmail.fm> wrote:
Most (though not all) authors who say that they don't control HTTP headers
actually have the power to control over the basic headers for their pages.


Is that true with Microsoft IIS servers


Yes, see http://www.w3.org/International/O-HTTP-charset


Thanks, but that's pretty much what I found on Microsoft's site.
Unfortunately it doesn't help me as an author. Only server
administrators can do all that right-clicking stuff. (I should have
made it more clear in my article that I'm an author with no admin
privileges on the IIS server in question.)
--
Stan Brown, Oak Road Systems, Cortland County, New York, USA
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2 spec: http://www.w3.org/TR/REC-CSS2/
validator: http://jigsaw.w3.org/css-validator/
Jul 20 '05 #9

P: n/a
Philipp Lenssen <ph*************@bb-k.com> wrote:
Now what's the German version of Em-Dash as described by you --
"used e.g. to make a break in the flow of a sentence"?
I'm not familiar with the exact rules of German punctuation - ich habe
Deutsch nur zwei Jahre in der Schule gelernt - and questions like this
mostly fall outside the scope of HTML authoring. The question "how do I
present such-and-such a character in HTML" is an HTML question; but
"should I use this or that character in my text" is about orthography
of the relevant language.

Well, maybe the questions are somewhat coupled. The official-looking
http://www.neue-rechtschreibung.de/r...rk_zeichen.htm
uses the undefined reference – as "Gedankenstrich"! Quis custodiat
ipsos custodes? They even have, for their page that specifically
describes the norms for German orthography, a title element that
grossly violates that orthography:
<title>deutsche rechtschreibung</title>
(and doesn't describe the specific content of that page).

What the – is apparently meant to mean is an EN DASH. But we
really cannot be sure. If the person who created that Web page did not
know the meanings of character references, can we know that he knows
the difference between EN DASH and EM DASH and made the correct
decision when describing the official rule? Without knowing the
original decisions, we cannot even know that they make such a
distinction either. I know that for Finnish, the official rules did not
originally make the distinction, and the current rules explicitly say
that you can choose whether you use EN DASH or EM DASH, which
corresponds to the actual usage - it varies, and sometimes it is
impossible to tell when you only see a printed text
whether there's a long variant of an EN DASH or a short variant of an
EM DASH. After all, there are no strict rules on the lengths of those
dashes, though EN DASH tends to have the width of "n" and EM DASH tends
to have the width of "m".
The old site is actually using a single dash for the purpose of
breaking a sentence. It looks really wrong to me.
It's definitely wrong typographically and orthographically. Yet it has
been advisable for robustness, in situations where the character
repertoire that can be reliably used is limited to ASCII, ISO Latin 1,
or something else that does not contain the real dashes.
On a side-note, is it really wrong to use Em-Dash with spacing left
and right? 'Cause that's what I do so far.


Depends on the orthography rules. It seems that originally EN DASH and
EM DASH were variants of one character (and official orthography rules
may still treat them that way), used so that EM DASH touches the
surrounding words whereas EN DASH is separated from them with spaces,
to compensate for the difference in dash length. But according to the
Finnish rules, for example, the use of a dash in the function discussed
here _requires_ spaces around the dash, no matter whether you use
EN DASH or EM DASH.

When considering the choice between EN DASH and EM DASH (when you have
a choice), there's the technicality that by Unicode line breaking
rules, a line break is permitted before and after an EM DASH, whereas
EN DASH belongs to category "break opportunity after". But browsers are
not very consistent in applying those rules, and IE seems to break only
after the dash character in either case, even if the dash is preceded
by a space. This is good behavior of course.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 20 '05 #10

P: n/a
On Thu, 14 Aug 2003, Philipp Lenssen wrote:
OK. Now what's the German version of Em-Dash as described by you --
"used e.g. to make a break in the flow of a sentence"?
Traditionally, Continental typography uses only the en-dash, with or
without spaces on both sides. An en-dash with spaces on both sides
is equivalent to the American em-dash without spaces.
Because I just
gave some alternatives to my team and they preferred the medium-long
version (En-Dash). But it's not used to indicate a range of values.
Yes it is. You write e.g. 128-255 (128 to 255) with an en-dash.
On a side-note, is it really wrong to use Em-Dash with spacing left and
right?


Yes.

Have a look at http://webdesign.crissov.de/Typografie/

Jul 20 '05 #11

P: n/a
On Thu, 14 Aug 2003, Jukka K. Korpela wrote:
The official-looking
http://www.neue-rechtschreibung.de/r...rk_zeichen.htm
There's nothing "official" with this site.
They even have, for their page that specifically
describes the norms for German orthography, a title element that
grossly violates that orthography:
<title>deutsche rechtschreibung</title>


Under #520, I read
Man setzt ein Apostroph
But correct is
Man setzt einen Apostroph

Jul 20 '05 #12

P: n/a
In article <Pine.GSO.4.44.0308141434010.4339-100000@s5b004>,
nh******@rrzn-user.uni-hannover.de says...
On Thu, 14 Aug 2003, Philipp Lenssen wrote:
(...)
Because I just
gave some alternatives to my team and they preferred the medium-long
version (En-Dash). But it's not used to indicate a range of values.


Yes it is. You write e.g. 128-255 (128 to 255) with an en-dash.


You misunderstood me. I was referring to where the character (En-Dash)
is used within the site we're doing. And it's not used to indicate a
range of values, but to separate two parts of a sentence (the German
Gedankenstrich).

So now I'm left with some questions:

- Which is the right meta-tag to include in my header, in case I can't
fully administrate the server,
- What would I tell the administrator to do,
- On my own Apache server, can I do it myself (I don't fully
administrate it, but I can place my htaccess files),
- What tool allows me to check which character set is being send, and
also lets me know if no character set information was included in the
HTTP header (I suppose I can do this with a browser)?
Jul 20 '05 #13

P: n/a
In article <MP************************@News.Individual.DE>,
ph*************@bb-k.com says...
(...)
So now I'm left with some questions:

- Which is the right meta-tag to include in my header, in case I can't
fully administrate the server,
- What would I tell the administrator to do,
- On my own Apache server, can I do it myself (I don't fully
administrate it, but I can place my htaccess files),
- What tool allows me to check which character set is being send, and
also lets me know if no character set information was included in the
HTTP header (I suppose I can do this with a browser)?


Actually, two more questions:

- I'd prefer to use &ndash;, or isn't that as good as using –?
- On some pages I would change, the doctype is XHTML1 (Strict), I
suppose the issue is all the same?
Jul 20 '05 #14

P: n/a
Philipp Lenssen <ph*************@bb-k.com> wrote:
- I'd prefer to use &ndash;, or isn't that as good as using –?
Netscape 4.x will understand – but not &ndash; .
- On some pages I would change, the doctype is XHTML1 (Strict), I
suppose the issue is all the same?


Doesn't matter.
Jul 20 '05 #15

P: n/a
Philipp Lenssen <ph*************@bb-k.com> wrote:
- Which is the right meta-tag to include in my header, in case I can't
fully administrate the server,
From the previous discussion, I assume you mean to set the encoding
("charset") of your pages.
- What would I tell the administrator to do,
- On my own Apache server, can I do it myself (I don't fully
administrate it, but I can place my htaccess files),
You have been pointed to http://www.w3.org/International/O-HTTP-charset
already in this thread.
- What tool allows me to check which character set is being send, and
also lets me know if no character set information was included in the
HTTP header (I suppose I can do this with a browser)?


You should consult http://validator.w3.org/ anyway and it will tell you
about the encoding, too.

--
Top posting.
What's the most irritating thing on Usenet?
Jul 20 '05 #16

P: n/a
In article <MP************************@News.Individual.DE> in
comp.infosystems.www.authoring.html, Philipp Lenssen
<ph*************@bb-k.com> wrote:
OK. Now what's the German version of Em-Dash as described by you --
"used e.g. to make a break in the flow of a sentence"? Because I just
gave some alternatives to my team and they preferred the medium-long
version (En-Dash). But it's not used to indicate a range of values.
You might get a different answer if you used a different font.

But, with respect, I think what anyone "prefers" isn't really the
issue when you're writing in English. There are rules, and they
should be followed or your work will look amateurish. (Are you
publishing your Web site in English. I'll assume you are, but if
you're publishing it in German then the rest of this article does
not apply!)

One rule, for instance, is that you make a break in a sentence with
an em dash or (on a typewriter) two consecutive hyphens. I'm sorry,
but an en dash is really not correct for this in English, even if it
may be in German.

Another rule is that a range of values is indicated with an en dash.
"J.S. Bach, 1685-1750" should have an en dash, not a hyphen. On a
typewriter, the en dash is represented by a single hyphen.
The old site is actually using a single dash for the purpose of breaking
a sentence. It looks really wrong to me.
I don't know what you mean by "a single dash". If you mean a single
_hyphen_, character 45, the it doesn't just look wrong, it _is_
wrong. A single en dash is also wrong. But if you mean a single em
dash, it's right.
On a side-note, is it really wrong to use Em-Dash with spacing left and
right? 'Cause that's what I do so far.


This is more a matter of publishers' "house style" than an agreed
rule. Some publishers use thin spaces, others use regular spaces,
others use no space at all. (Older books used other punctuation
adjacent to the em dash; this is no longer considered correct.)

If you do use spaces, you probably want to do something like this:
the one thing&nbsp;— the only thing&nbsp;— is
That way your line can break after an em dash but not before it.
(Are there any browsers smart enough to break after the em dash but
not before, in something like
the one thing—the only thing—is
where there are no spaces around the em dashes?)

Spaces are not usually written around the en dash.

Suggestion: If it's a matter of rules of English punctuation, as
opposed to how to represent punctuation in HTML, then
alt.usage.english is probably a better newsgroup. I'm _not_
crossposting this because maybe I've misinterpreted everything and
you're actually talking about German punctuation -- in which case
you probably haven't read this far. :-)

--
Stan Brown, Oak Road Systems, Cortland County, New York, USA
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2 spec: http://www.w3.org/TR/REC-CSS2/
validator: http://jigsaw.w3.org/css-validator/
Jul 20 '05 #17

P: n/a
On Fri, 15 Aug 2003, Stan Brown wrote:
Like the business of ." and ". it seems -- in the US the period must
come inside the quotes, but UK practice is not so rigid.


Fowler's MEU 2nd edition ed. Gowers (dunno whether this was from the
original or from the editor of the 2nd edition) categorises two
distinct rule sets, the "conventional" and the "logical". The
"conventional" one corresponds to US usage.

UK practice is "much debated", but saying that it is "not so rigid"
would be misleading, since each school of thought is pretty rigid in
its position ;-}

The conventional one was AIUI developed because of physical
limitations of hot type.

The logical one seems to have much to commend it when no such
limitations supervene.

In the treatment of question and exclamation marks, MEU2 concludes
that the illogicality of the conventional approach is exposed, and
concludes that authors from both schools of thought therefore follow
the logical ordering with those marks.

Did you say "I am not my brother's keeper"?
I said "am I my brother's keeper?"

and so on.
Jul 20 '05 #18

P: n/a
In article <Xn*****************************@193.229.0.31>,
"Jukka K. Korpela" <jk******@cs.tut.fi> wrote:
I know that for Finnish, the official rules did not
originally make the distinction, and the current rules explicitly say
that you can choose whether you use EN DASH or EM DASH,
Those rules weren't written by typographers. They were written by
linguists whose attitude seems to have been "whatever as long as it is
not the hyphen".
which corresponds to the actual usage


But not necessarily with typographer opinion. There are a lot of
clueless actual use cases--many of which can be attibuted to the
brokenness of the default "corrections" made by Microsoft Word or to the
bad design of keyboard layouts. (Typing of the right characters on
non-Apple Finnish keyboard layouts is unnecessary hard.)

--
Henri Sivonen
hs******@iki.fi
http://www.iki.fi/hsivonen/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html
Jul 20 '05 #19

P: n/a
Henri Sivonen <hs******@iki.fi> wrote:
Traditionally, Continental typography uses only the en-dash, with or
without spaces on both sides. An en-dash with spaces on both sides
is equivalent to the American em-dash without spaces.
En dashes without spaces meaning break of flow in Continental
typography?


Please read again what I wrote.
AFAIK, the common typographer opinion is:

American:
Range: en dash without spaces
Break of flow: em dash without spaces

European (including comtemporary British):
Range: en dash without spaces
Break of flow: en dash with spaces


Exactly what I wrote.
Jul 20 '05 #20

This discussion thread is closed

Replies have been disabled for this discussion.