By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,246 Members | 1,304 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,246 IT Pros & Developers. It's quick & easy.

Non-ASCII characters in web pages

P: n/a
I run a website for my very extended family. The site is not a static one,
and pages are frequently added and changed. I constructed it by myself, but
I can best be described as a casual and unsophisticated web designer.

Because I have always had difficulty in producing dashes on my pages, I
generally use double hyphens instead. Books that I possess or have seen on
HTML tell me that I could make e.g. an em-dash by using the escape sequence
(without the quotes) "&emdash;", but this is displayed *literally* on
browsers, not as an em-dash. A friend has told me now that I can make the
desired dashes with the strings "&#8211" and "&#8212" for en- and em-dashes
respectively, and sure enough this works.

These strings are very unintuitive (which is an understatement); there is no
obvious way to form a mnemonic to remember them. Why do books tell me the
easily remembered strings I have mentioned above? Has the spec changed? When
and why?

More importantly, is there a list somewhere on the Net that I could
download, and that would list all the other similar strings for non-ASCII
characters: quotes, spaces, diacritics, etc.?

--
Stan Goodman
Qiryat Tiv'on
Israel

To send me email, please replace the CAPITAL_LETTERS with "sig". Please do
not send me HTML-formatted messages.Please do not send me attachments
without telling me beforehand.

Jul 20 '05 #1
Share this Question
Share on Google+
23 Replies


P: n/a
Stan Goodman:
Books that I possess or have seen on
HTML tell me that I could make e.g. an em-dash by using the escape sequence
(without the quotes) "&emdash;", but this is displayed *literally* on
browsers, not as an em-dash. A friend has told me now that I can make the
desired dashes with the strings "&#8211" and "&#8212" for en- and em-dashes
respectively, and sure enough this works.
It should be "mdash" (not "emdash"). Either your books are crap, or you
should get new glasses.
More importantly, is there a list somewhere on the Net that I could
download, and that would list all the other similar strings for non-ASCII
characters: quotes, spaces, diacritics, etc.?


Try the HTML 4.01 specification:

<URL:http://www.w3.org/TR/html40/sgml/entities.html>

--
Bertilo Wennergren <be******@gmx.net> <http://www.bertilow.com>
Jul 20 '05 #2

P: n/a
Stan Goodman wrote:
Books that I possess or have seen on
HTML tell me that I could make e.g. an em-dash by using the escape sequence
(without the quotes) "&emdash;", but this is displayed *literally* on
browsers, not as an em-dash. A friend has told me now that I can make the
desired dashes with the strings "&#8211" and "&#8212" for en- and em-dashes
respectively, and sure enough this works.
Bertilo Wennergren <be******@gmx.net> wrote:
It should be "mdash" (not "emdash"). Either your books are crap, or you
should get new glasses.


To be fair, HTML 3.0 (RIP) specified &endash; and &emdash; rather than the
&ndash; and &mdash; specified by HTML 4.x and implemented by modern
browsers.
More importantly, is there a list somewhere on the Net that I could
download, and that would list all the other similar strings for non-ASCII
characters: quotes, spaces, diacritics, etc.?


Try the HTML 4.01 specification:

<URL:http://www.w3.org/TR/html40/sgml/entities.html>


The situation is complicated by browser support, or lack thereof.

There are (old, obsolete) browsers that display — properly, but don't
display &mdash; properly. So by using — instead of &mdash;, you
improve the situation for readers using those browsers.

There are other (older, more obsolete) browsers that display neither
— nor &mdash; properly. When the character name is displayed "as is"
by such browsers, "mdash" might be more sensible than "#8212", and by using
&mdash; instead of —, you might improve the situation for readers
using those browsers.

See also http://www.htmlhelp.com/faq/html/bas...l#special-char
--
Darin McGrew, mc****@stanfordalumni.org, http://www.rahul.net/mcgrew/
Web Design Group, da***@htmlhelp.com, http://www.HTMLHelp.com/

"Experience is something you don't get until just after you need it."
Jul 20 '05 #3

P: n/a
On Tue, 24 Feb 2004 00:59:46 +0100, Bertilo Wennergren <be******@gmx.net>
wrote:

It should be "mdash" (not "emdash").
This is true.
Either your books are crap, or you
should get new glasses.


This is insulting, and for no purpose.
More importantly, is there a list somewhere on the Net that I could
download, and that would list all the other similar strings for
non-ASCII
characters: quotes, spaces, diacritics, etc.?


Try the HTML 4.01 specification:

<URL:http://www.w3.org/TR/html40/sgml/entities.html>


I'll also give this very comprehensive link:

http://www.pemberley.com/janeinfo/latin1.html
Jul 20 '05 #4

P: n/a
Neal:
On Tue, 24 Feb 2004 00:59:46 +0100, Bertilo Wennergren <be******@gmx.net>
wrote:
It should be "mdash" (not "emdash"). This is true.

Either your books are crap, or you should get new glasses.

This is insulting, and for no purpose.


Sorry, I didn't mean to insult. It was a bad joke. I regret it.

--
Bertilo Wennergren <be******@gmx.net> <http://www.bertilow.com>
Jul 20 '05 #5

P: n/a
On 23 Feb 2004 23:31:11 GMT, "Stan Goodman" <SP*********@hashkedim.com>
wrote:
Because I have always had difficulty in producing dashes on my pages, I
generally use double hyphens instead. Books that I possess or have seen on
HTML tell me that I could make e.g. an em-dash by using the escape sequence
(without the quotes) "&emdash;", but this is displayed *literally* on
browsers, not as an em-dash. A friend has told me now that I can make the
desired dashes with the strings "&#8211" and "&#8212" for en- and em-dashes
respectively, and sure enough this works.

These strings are very unintuitive (which is an understatement); there is no
obvious way to form a mnemonic to remember them. Why do books tell me the
easily remembered strings I have mentioned above? Has the spec changed? When
and why?


To what others have said, I would add:

- At least the commoner characters are supported in the mnemonic form
(character entity reference) by most/all browsers newer than Netscape 4.

- I use a couple of sed scripts to produce these characters myself. They
are available on my site if you'd like to try them:
http://www.xs4all.nl/~sbpoley/webmat...er_quotes.html

--
Stephen Poley

http://www.xs4all.nl/~sbpoley/webmatters/
Jul 20 '05 #6

P: n/a
On 23 Feb 2004, Stan Goodman wrote:
A friend has told me now that I can make the
desired dashes with the strings "&#8211" and "&#8212" for en- and em-dashes
respectively, and sure enough this works.
In addition to the other answers, see also
<http://ppewww.ph.gla.ac.uk/~flavell/charset/checklist.html#NoteUTF>
More importantly, is there a list somewhere on the Net that I could
download, and that would list all the other similar strings for non-ASCII
characters: quotes, spaces, diacritics, etc.?


For example <http://www.unics.uni-hannover.de/nhtcapri/multilingual2.html>

Jul 20 '05 #7

P: n/a
On Mon, 23 Feb 2004 23:59:46 UTC, Bertilo Wennergren <be******@gmx.net>
opined:
Stan Goodman:
Books that I possess or have seen on
HTML tell me that I could make e.g. an em-dash by using the escape sequence
(without the quotes) "&emdash;", but this is displayed *literally* on
browsers, not as an em-dash. A friend has told me now that I can make the
desired dashes with the strings "&#8211" and "&#8212" for en- and em-dashes
respectively, and sure enough this works.


It should be "mdash" (not "emdash"). Either your books are crap, or you
should get new glasses.
More importantly, is there a list somewhere on the Net that I could
download, and that would list all the other similar strings for non-ASCII
characters: quotes, spaces, diacritics, etc.?


Try the HTML 4.01 specification:

<URL:http://www.w3.org/TR/html40/sgml/entities.html>


Thank you,

--
Stan Goodman
Qiryat Tiv'on
Israel

Saddam is gone. Ceterum, censeo Arafat esse delendum.

To send me email, please replace the CAPITAL_LETTERS with "sig". Please do
not send me HTML-formatted messages.Please do not send me attachments
without telling me beforehand.

Jul 20 '05 #8

P: n/a
On Tue, 24 Feb 2004 01:06:53 UTC, Darin McGrew <mc****@stanfordalumni.org>
opined:
Stan Goodman wrote:
Books that I possess or have seen on
HTML tell me that I could make e.g. an em-dash by using the escape sequence
(without the quotes) "&emdash;", but this is displayed *literally* on
browsers, not as an em-dash. A friend has told me now that I can make the
desired dashes with the strings "&#8211" and "&#8212" for en- and em-dashes
respectively, and sure enough this works.
Bertilo Wennergren <be******@gmx.net> wrote:
It should be "mdash" (not "emdash"). Either your books are crap, or you
should get new glasses.


To be fair, HTML 3.0 (RIP) specified &endash; and &emdash; rather than the
&ndash; and &mdash; specified by HTML 4.x and implemented by modern
browsers.
More importantly, is there a list somewhere on the Net that I could
download, and that would list all the other similar strings for non-ASCII
characters: quotes, spaces, diacritics, etc.?


Try the HTML 4.01 specification:

<URL:http://www.w3.org/TR/html40/sgml/entities.html>


The situation is complicated by browser support, or lack thereof.

There are (old, obsolete) browsers that display — properly, but don't
display &mdash; properly. So by using — instead of &mdash;, you
improve the situation for readers using those browsers.

There are other (older, more obsolete) browsers that display neither
— nor &mdash; properly. When the character name is displayed "as is"
by such browsers, "mdash" might be more sensible than "#8212", and by using
&mdash; instead of —, you might improve the situation for readers
using those browsers.

See also http://www.htmlhelp.com/faq/html/bas...l#special-char


I am grateful to you for your fuller (and, incidentally, more temperate)
explanation of the situation. Evidently, the book most readily available to
me may not be actual crap, but merely obsolete (HTML v3.2), and that I still
do not need glasses. A reading of my query will show that a change in the
specification was exactly what I was asking about.

--
Stan Goodman
Qiryat Tiv'on
Israel

Saddam is gone. Ceterum, censeo Arafat esse delendum.

To send me email, please replace the CAPITAL_LETTERS with "sig". Please do
not send me HTML-formatted messages.Please do not send me attachments
without telling me beforehand.

Jul 20 '05 #9

P: n/a
On Tue, 24 Feb 2004 10:42:28 UTC, Bertilo Wennergren <be******@gmx.net>
opined:
Neal:
On Tue, 24 Feb 2004 00:59:46 +0100, Bertilo Wennergren <be******@gmx.net>
wrote:
It should be "mdash" (not "emdash").

This is true.

Either your books are crap, or you should get new glasses.

This is insulting, and for no purpose.


Sorry, I didn't mean to insult. It was a bad joke. I regret it.


All is forgiven.

--
Stan Goodman
Qiryat Tiv'on
Israel

Saddam is gone. Ceterum, censeo Arafat esse delendum.

To send me email, please replace the CAPITAL_LETTERS with "sig". Please do
not send me HTML-formatted messages.Please do not send me attachments
without telling me beforehand.

Jul 20 '05 #10

P: n/a
"Stan Goodman" <SP*********@hashkedim.com> wrote:
Evidently, the book most readily available to
me may not be actual crap, but merely obsolete (HTML v3.2)
No, &emdash; and &endash; were not in HTML 3.2 (which had no entities
that would expand to em dash or en dash).
A reading of my query will show
that a change in the specification was exactly what I was asking
about.


There was no change in the specification in this issue.

Darin wrote that "HTML 3.0 (RIP) specified &endash; and &emdash;", but
I am unable to find such entities in the HTML 3.0 draft. Anyway, if
they were there, it would be rather old info - the HTML 3.0 draft
expired in 1995 - and it would not have been a change in any
specification, since HTML 3.0 was just an incomplete draft.

It is my understanding that &endash; and &emdash; were just some
browser's invention, years ago.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 20 '05 #11

P: n/a
Jukka K. Korpela <jk******@cs.tut.fi> wrote:
Darin wrote that "HTML 3.0 (RIP) specified &endash; and &emdash;", but
I am unable to find such entities in the HTML 3.0 draft.
They're mentioned here: http://www.w3.org/MarkUp/html3/specialchars.html
It is my understanding that &endash; and &emdash; were just some
browser's invention, years ago.


The LaTeX manual refers to "endash" and "emdash" when describing the
characters generated from the input "--" and "---", so there was at least
some precedent.
--
Darin McGrew, mc****@stanfordalumni.org, http://www.rahul.net/mcgrew/
Web Design Group, da***@htmlhelp.com, http://www.HTMLHelp.com/

"Who is General Failure and why is he reading my hard disk?"
Jul 20 '05 #12

P: n/a
It seems "Andreas Prilop" wrote in
comp.infosystems.www.authoring.html:

For example <http://www.unics.uni-hannover.de/nhtcapri/multilingual2.html>


There's no text after the <h2>Mathematical Symbols</h2>

--
Stan Brown, Oak Road Systems, Cortland County, New York, USA
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2 spec: http://www.w3.org/TR/REC-CSS2/
2.1 changes: http://www.w3.org/TR/CSS21/changes.html
validator: http://jigsaw.w3.org/css-validator/
Jul 20 '05 #13

P: n/a
On Wed, 25 Feb 2004 00:17:49 -0500, Stan Brown
<th************@fastmail.fm> declared in
comp.infosystems.www.authoring.html:
It seems "Andreas Prilop" wrote in
comp.infosystems.www.authoring.html:

For example <http://www.unics.uni-hannover.de/nhtcapri/multilingual2.html>

There's no text after the <h2>Mathematical Symbols</h2>


No, it's a link to a separate page.

--
Mark Parnell
http://www.clarkecomputers.com.au
Jul 20 '05 #14

P: n/a
Darin McGrew <mc****@stanfordalumni.org> wrote:
Jukka K. Korpela <jk******@cs.tut.fi> wrote:
Darin wrote that "HTML 3.0 (RIP) specified &endash; and &emdash;",
but I am unable to find such entities in the HTML 3.0 draft.
They're mentioned here:
http://www.w3.org/MarkUp/html3/specialchars.html


Thanks for the info. I had looked at the parts that specifically list
entities, and didn't find them there. It really was an incomplete
draft!
The LaTeX manual refers to "endash" and "emdash" when describing
the characters generated from the input "--" and "---", so there
was at least some precedent.


And the names themselves are natural.

But entity names in HTML have generally been taken from the entity sets
in appendix D of the SGML standard. The names that were actually taken
into HTML specifications follow this principle. This is why they look
so odd, half-mnemonic and irregular. (E.g., em dash is "&mdash;" but
em space is "&emsp;". Actually some people might say this is logical in
an odd way, since em space is by Unicode definition a space with a
fixed width of em, the font size, whereas em dash might vary in width,
though historically and in common font design it has the width of one
em.)

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 20 '05 #15

P: n/a
It seems "Mark Parnell" wrote in
comp.infosystems.www.authoring.html:
On Wed, 25 Feb 2004 00:17:49 -0500, Stan Brown
<th************@fastmail.fm> declared in
comp.infosystems.www.authoring.html:
It seems "Andreas Prilop" wrote in
comp.infosystems.www.authoring.html:

For example <http://www.unics.uni-hannover.de/nhtcapri/multilingual2.html>

There's no text after the <h2>Mathematical Symbols</h2>


No, it's a link to a separate page.


Talk about non-obvious navigation!

--
Stan Brown, Oak Road Systems, Cortland County, New York, USA
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2 spec: http://www.w3.org/TR/REC-CSS2/
2.1 changes: http://www.w3.org/TR/CSS21/changes.html
validator: http://jigsaw.w3.org/css-validator/
Jul 20 '05 #16

P: n/a
On Wed, 25 Feb 2004, Stan Brown wrote:
For example <http://www.unics.uni-hannover.de/nhtcapri/multilingual2.html>


There's no text after the <h2>Mathematical Symbols</h2>


I wonder from where you got <h2>Mathematical Symbols</h2> ?
The source reads
<h2 class="noprint"><a href="mathematics.html">Mathematical symbols</a></h2>

Jul 20 '05 #17

P: n/a
On Wed, 25 Feb 2004, Jukka K. Korpela wrote:
But entity names in HTML have generally been taken from the entity sets
in appendix D of the SGML standard. The names that were actually taken
into HTML specifications follow this principle. This is why they look
so odd, half-mnemonic and irregular. (E.g., em dash is "&mdash;" but
em space is "&emsp;".


My understanding was that the length of entities in SGML was limited to
5 letters, therefore &acute; but &uml; and &circ; . Compositions with
base letters may have 6 letters, like &eacute; .

Jul 20 '05 #18

P: n/a
Andreas Prilop <nh******@rrzn-user.uni-hannover.de> wrote:
My understanding was that the length of entities in SGML was
limited to 5 letters, therefore &acute; but &uml; and &circ; .
The SGML reference concrete syntax sets NAMELEN to 8, so I don't think
that's the explanation. (HTML for example, when formally defined as an
SGML application, sets NAMELEN to 65536.)

And you actually present some counterevidence:
Compositions with base letters may have 6 letters, like &eacute; .


Besides, there's e.g. &middot;, which is in HTML too.

Actually they say in clause D.4.1.3 of the SGML standard:

"The entity names are derived from the English language.
They were chosen for a maximum mnemonic value, consistent
with the logical and systematic use of abbreviations."

(Sounds funny, doesn't it? Anyone who regards the entity names as
mnemonic has a memory that works in odd ways.) And:

"The entity names are limited to six characters in length - -".

So it's just an odd restriction they decided to impose, with no good
reason within SGML itself. We old-timers might compute 1+6+1 = 8 and
guess that the idea was that the entity reference as a whole, including
"&" and ";", would fit into a doubleword for efficiency.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 20 '05 #19

P: n/a
"Jukka K. Korpela" <jk******@cs.tut.fi> wrote:
My understanding was that the length of entities in SGML was
limited to 5 letters, therefore &acute; but &uml; and &circ; .


And you actually present some counterevidence:


Our five letters are "a c u t e" and "e" .... our six letters are ....
I'll come in again.

You are right. Several of the Latin-1 entities show the abbreviation
with six letters:
&curren; &brvbar; &plusmn; &frac12; &iquest;
Jul 20 '05 #20

P: n/a
It seems "Andreas Prilop" wrote in
comp.infosystems.www.authoring.html:
On Wed, 25 Feb 2004, Stan Brown wrote:
For example <http://www.unics.uni-hannover.de/nhtcapri/multilingual2.html>
There's no text after the <h2>Mathematical Symbols</h2>


I guessed -- it was the same size as the other major headings on the
page. <h2> is the usual way to do a major heading.
I wonder from where you got <h2>Mathematical Symbols</h2> ?
The source reads
<h2 class="noprint"><a href="mathematics.html">Mathematical symbols</a></h2>


I am sure you are not seriously suggesting that visitors should have
to view source of a page to determine whether something is a link!

I think the (unidentified) author of that page goofed. A heading
logically should be followed by text. If it's a link with no text
then it's not logically a heading -- and if it's a link it should
look like one.

--
Stan Brown, Oak Road Systems, Cortland County, New York, USA
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2 spec: http://www.w3.org/TR/REC-CSS2/
2.1 changes: http://www.w3.org/TR/CSS21/changes.html
validator: http://jigsaw.w3.org/css-validator/
Jul 20 '05 #21

P: n/a
On Wed, 25 Feb 2004 09:19:04 -0500, Stan Brown
<th************@fastmail.fm> declared in
comp.infosystems.www.authoring.html:
Talk about non-obvious navigation!


Took me a while to work it out. Sure, it was blue and the other headings
were black, but from memory the only reason I realised was because I
happened to run my mouse over it. :-S

--
Mark Parnell
http://www.clarkecomputers.com.au
Jul 20 '05 #22

P: n/a
It seems "Mark Parnell" wrote in
comp.infosystems.www.authoring.html:
On Wed, 25 Feb 2004 09:19:04 -0500, Stan Brown
<th************@fastmail.fm> declared in
comp.infosystems.www.authoring.html:
Talk about non-obvious navigation!


Took me a while to work it out. Sure, it was blue and the other headings
were black, but from memory the only reason I realised was because I
happened to run my mouse over it. :-S


If the page gave any indication of its authorship, I'd send the
author a note.

It never ceases to amaze me how many pages give absolutely no
contact information, not even a link to it. Sometimes by shortening
the URL one can stumble onto a home page, but lately that seems to
lead more often to a "not found" or "forbidden", so I usually don't
bother.

--
Stan Brown, Oak Road Systems, Cortland County, New York, USA
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2 spec: http://www.w3.org/TR/REC-CSS2/
2.1 changes: http://www.w3.org/TR/CSS21/changes.html
validator: http://jigsaw.w3.org/css-validator/
Jul 20 '05 #23

P: n/a
Stan Brown <th************@fastmail.fm> wrote:
I wonder from where you got <h2>Mathematical Symbols</h2> ?
The source reads
<h2 class="noprint"><a href="mathematics.html">Mathematical
symbols</a></h2>
I am sure you are not seriously suggesting that visitors should
have to view source of a page to determine whether something is a
link!


I'm afraid that's what we may need to do at times, as users. :-(
I think the (unidentified) author of that page goofed. A heading
logically should be followed by text. If it's a link with no text
then it's not logically a heading -- and if it's a link it should
look like one.


Well, yes. What I would do is to keep the heading, removing the link
markup there, and adding the following after it:

<div>See separate page <cite><a href="mathematics.html"
rel="next">Mathematical formulas in HTML 4.0</a></cite>.</div>

(probably wrapping this together with the heading inside
<div class="noprint">...</div>)

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 20 '05 #24

This discussion thread is closed

Replies have been disabled for this discussion.