
July 20th, 2005, 06:48 PM
| | | Non-ASCII characters in web pages
I run a website for my very extended family. The site is not a static one,
and pages are frequently added and changed. I constructed it by myself, but
I can best be described as a casual and unsophisticated web designer.
Because I have always had difficulty in producing dashes on my pages, I
generally use double hyphens instead. Books that I possess or have seen on
HTML tell me that I could make e.g. an em-dash by using the escape sequence
(without the quotes) "&emdash;", but this is displayed *literally* on
browsers, not as an em-dash. A friend has told me now that I can make the
desired dashes with the strings "–" and "—" for en- and em-dashes
respectively, and sure enough this works.
These strings are very unintuitive (which is an understatement); there is no
obvious way to form a mnemonic to remember them. Why do books tell me the
easily remembered strings I have mentioned above? Has the spec changed? When
and why?
More importantly, is there a list somewhere on the Net that I could
download, and that would list all the other similar strings for non-ASCII
characters: quotes, spaces, diacritics, etc.?
--
Stan Goodman
Qiryat Tiv'on
Israel
To send me email, please replace the CAPITAL_LETTERS with "sig". Please do
not send me HTML-formatted messages.Please do not send me attachments
without telling me beforehand. | 
July 20th, 2005, 06:48 PM
| | | Re: Non-ASCII characters in web pages
Stan Goodman:
[color=blue]
> Books that I possess or have seen on
> HTML tell me that I could make e.g. an em-dash by using the escape sequence
> (without the quotes) "&emdash;", but this is displayed *literally* on
> browsers, not as an em-dash. A friend has told me now that I can make the
> desired dashes with the strings "–" and "—" for en- and em-dashes
> respectively, and sure enough this works.[/color]
It should be "mdash" (not "emdash"). Either your books are crap, or you
should get new glasses.
[color=blue]
> More importantly, is there a list somewhere on the Net that I could
> download, and that would list all the other similar strings for non-ASCII
> characters: quotes, spaces, diacritics, etc.?[/color]
Try the HTML 4.01 specification:
<URL:http://www.w3.org/TR/html40/sgml/entities.html>
--
Bertilo Wennergren <bertilow@gmx.net> <http://www.bertilow.com> | 
July 20th, 2005, 06:48 PM
| | | Re: Non-ASCII characters in web pages
Stan Goodman wrote:[color=blue][color=green]
>> Books that I possess or have seen on
>> HTML tell me that I could make e.g. an em-dash by using the escape sequence
>> (without the quotes) "&emdash;", but this is displayed *literally* on
>> browsers, not as an em-dash. A friend has told me now that I can make the
>> desired dashes with the strings "–" and "—" for en- and em-dashes
>> respectively, and sure enough this works.[/color][/color]
Bertilo Wennergren <bertilow@gmx.net> wrote:[color=blue]
> It should be "mdash" (not "emdash"). Either your books are crap, or you
> should get new glasses.[/color]
To be fair, HTML 3.0 (RIP) specified &endash; and &emdash; rather than the
– and — specified by HTML 4.x and implemented by modern
browsers.
[color=blue][color=green]
>> More importantly, is there a list somewhere on the Net that I could
>> download, and that would list all the other similar strings for non-ASCII
>> characters: quotes, spaces, diacritics, etc.?[/color]
>
> Try the HTML 4.01 specification:
>
> <URL:http://www.w3.org/TR/html40/sgml/entities.html>[/color]
The situation is complicated by browser support, or lack thereof.
There are (old, obsolete) browsers that display — properly, but don't
display — properly. So by using — instead of —, you
improve the situation for readers using those browsers.
There are other (older, more obsolete) browsers that display neither
— nor — properly. When the character name is displayed "as is"
by such browsers, "mdash" might be more sensible than "#8212", and by using
— instead of —, you might improve the situation for readers
using those browsers.
See also http://www.htmlhelp.com/faq/html/bas...l#special-char
--
Darin McGrew, mcgrew@stanfordalumni.org, http://www.rahul.net/mcgrew/
Web Design Group, darin@htmlhelp.com, http://www.HTMLHelp.com/
"Experience is something you don't get until just after you need it." | 
July 20th, 2005, 06:48 PM
| | | Re: Non-ASCII characters in web pages
On Tue, 24 Feb 2004 00:59:46 +0100, Bertilo Wennergren <bertilow@gmx.net>
wrote:
[color=blue]
> It should be "mdash" (not "emdash").[/color]
This is true.
[color=blue]
> Either your books are crap, or you
> should get new glasses.[/color]
This is insulting, and for no purpose.
[color=blue][color=green]
>> More importantly, is there a list somewhere on the Net that I could
>> download, and that would list all the other similar strings for
>> non-ASCII
>> characters: quotes, spaces, diacritics, etc.?[/color]
>
> Try the HTML 4.01 specification:
>
> <URL:http://www.w3.org/TR/html40/sgml/entities.html>
>[/color]
I'll also give this very comprehensive link: http://www.pemberley.com/janeinfo/latin1.html | 
July 20th, 2005, 06:49 PM
| | | Re: Non-ASCII characters in web pages
Neal:
[color=blue]
> On Tue, 24 Feb 2004 00:59:46 +0100, Bertilo Wennergren <bertilow@gmx.net>
> wrote:[color=green]
>> It should be "mdash" (not "emdash").[/color]
> This is true.[/color]
[color=blue][color=green]
>> Either your books are crap, or you should get new glasses.[/color][/color]
[color=blue]
> This is insulting, and for no purpose.[/color]
Sorry, I didn't mean to insult. It was a bad joke. I regret it.
--
Bertilo Wennergren <bertilow@gmx.net> <http://www.bertilow.com> | 
July 20th, 2005, 06:49 PM
| | | Re: Non-ASCII characters in web pages
On 23 Feb 2004 23:31:11 GMT, "Stan Goodman" <SPAM_FOILER@hashkedim.com>
wrote:
[color=blue]
>Because I have always had difficulty in producing dashes on my pages, I
>generally use double hyphens instead. Books that I possess or have seen on
>HTML tell me that I could make e.g. an em-dash by using the escape sequence
>(without the quotes) "&emdash;", but this is displayed *literally* on
>browsers, not as an em-dash. A friend has told me now that I can make the
>desired dashes with the strings "–" and "—" for en- and em-dashes
>respectively, and sure enough this works.
>
>These strings are very unintuitive (which is an understatement); there is no
>obvious way to form a mnemonic to remember them. Why do books tell me the
>easily remembered strings I have mentioned above? Has the spec changed? When
>and why?[/color]
To what others have said, I would add:
- At least the commoner characters are supported in the mnemonic form
(character entity reference) by most/all browsers newer than Netscape 4.
- I use a couple of sed scripts to produce these characters myself. They
are available on my site if you'd like to try them: http://www.xs4all.nl/~sbpoley/webmat...er_quotes.html
--
Stephen Poley http://www.xs4all.nl/~sbpoley/webmatters/ | 
July 20th, 2005, 06:49 PM
| | | Re: Non-ASCII characters in web pages
On 23 Feb 2004, Stan Goodman wrote:
[color=blue]
> A friend has told me now that I can make the
> desired dashes with the strings "–" and "—" for en- and em-dashes
> respectively, and sure enough this works.[/color]
In addition to the other answers, see also
<http://ppewww.ph.gla.ac.uk/~flavell/charset/checklist.html#NoteUTF>
[color=blue]
> More importantly, is there a list somewhere on the Net that I could
> download, and that would list all the other similar strings for non-ASCII
> characters: quotes, spaces, diacritics, etc.?[/color]
For example <http://www.unics.uni-hannover.de/nhtcapri/multilingual2.html> | 
July 20th, 2005, 06:49 PM
| | | Re: Non-ASCII characters in web pages
On Mon, 23 Feb 2004 23:59:46 UTC, Bertilo Wennergren <bertilow@gmx.net>
opined:[color=blue]
> Stan Goodman:
>[color=green]
> > Books that I possess or have seen on
> > HTML tell me that I could make e.g. an em-dash by using the escape sequence
> > (without the quotes) "&emdash;", but this is displayed *literally* on
> > browsers, not as an em-dash. A friend has told me now that I can make the
> > desired dashes with the strings "–" and "—" for en- and em-dashes
> > respectively, and sure enough this works.[/color]
>
> It should be "mdash" (not "emdash"). Either your books are crap, or you
> should get new glasses.
>[color=green]
> > More importantly, is there a list somewhere on the Net that I could
> > download, and that would list all the other similar strings for non-ASCII
> > characters: quotes, spaces, diacritics, etc.?[/color]
>
> Try the HTML 4.01 specification:
>
> <URL:http://www.w3.org/TR/html40/sgml/entities.html>[/color]
Thank you,
--
Stan Goodman
Qiryat Tiv'on
Israel
Saddam is gone. Ceterum, censeo Arafat esse delendum.
To send me email, please replace the CAPITAL_LETTERS with "sig". Please do
not send me HTML-formatted messages.Please do not send me attachments
without telling me beforehand. | 
July 20th, 2005, 06:49 PM
| | | Re: Non-ASCII characters in web pages
On Tue, 24 Feb 2004 01:06:53 UTC, Darin McGrew <mcgrew@stanfordalumni.org>
opined:[color=blue]
> Stan Goodman wrote:[color=green][color=darkred]
> >> Books that I possess or have seen on
> >> HTML tell me that I could make e.g. an em-dash by using the escape sequence
> >> (without the quotes) "&emdash;", but this is displayed *literally* on
> >> browsers, not as an em-dash. A friend has told me now that I can make the
> >> desired dashes with the strings "–" and "—" for en- and em-dashes
> >> respectively, and sure enough this works.[/color][/color]
>
> Bertilo Wennergren <bertilow@gmx.net> wrote:[color=green]
> > It should be "mdash" (not "emdash"). Either your books are crap, or you
> > should get new glasses.[/color]
>
> To be fair, HTML 3.0 (RIP) specified &endash; and &emdash; rather than the
> – and — specified by HTML 4.x and implemented by modern
> browsers.
>[color=green][color=darkred]
> >> More importantly, is there a list somewhere on the Net that I could
> >> download, and that would list all the other similar strings for non-ASCII
> >> characters: quotes, spaces, diacritics, etc.?[/color]
> >
> > Try the HTML 4.01 specification:
> >
> > <URL:http://www.w3.org/TR/html40/sgml/entities.html>[/color]
>
> The situation is complicated by browser support, or lack thereof.
>
> There are (old, obsolete) browsers that display — properly, but don't
> display — properly. So by using — instead of —, you
> improve the situation for readers using those browsers.
>
> There are other (older, more obsolete) browsers that display neither
> — nor — properly. When the character name is displayed "as is"
> by such browsers, "mdash" might be more sensible than "#8212", and by using
> — instead of —, you might improve the situation for readers
> using those browsers.
>
> See also http://www.htmlhelp.com/faq/html/bas...l#special-char[/color]
I am grateful to you for your fuller (and, incidentally, more temperate)
explanation of the situation. Evidently, the book most readily available to
me may not be actual crap, but merely obsolete (HTML v3.2), and that I still
do not need glasses. A reading of my query will show that a change in the
specification was exactly what I was asking about.
--
Stan Goodman
Qiryat Tiv'on
Israel
Saddam is gone. Ceterum, censeo Arafat esse delendum.
To send me email, please replace the CAPITAL_LETTERS with "sig". Please do
not send me HTML-formatted messages.Please do not send me attachments
without telling me beforehand. | 
July 20th, 2005, 06:49 PM
| | | Re: Non-ASCII characters in web pages
On Tue, 24 Feb 2004 10:42:28 UTC, Bertilo Wennergren <bertilow@gmx.net>
opined:[color=blue]
> Neal:
>[color=green]
> > On Tue, 24 Feb 2004 00:59:46 +0100, Bertilo Wennergren <bertilow@gmx.net>
> > wrote:[color=darkred]
> >> It should be "mdash" (not "emdash").[/color]
> > This is true.[/color]
>[color=green][color=darkred]
> >> Either your books are crap, or you should get new glasses.[/color][/color]
>[color=green]
> > This is insulting, and for no purpose.[/color]
>
> Sorry, I didn't mean to insult. It was a bad joke. I regret it.[/color]
All is forgiven.
--
Stan Goodman
Qiryat Tiv'on
Israel
Saddam is gone. Ceterum, censeo Arafat esse delendum.
To send me email, please replace the CAPITAL_LETTERS with "sig". Please do
not send me HTML-formatted messages.Please do not send me attachments
without telling me beforehand. | 
July 20th, 2005, 06:49 PM
| | | Re: Non-ASCII characters in web pages
"Stan Goodman" <SPAM_FOILER@hashkedim.com> wrote:
[color=blue]
> Evidently, the book most readily available to
> me may not be actual crap, but merely obsolete (HTML v3.2)[/color]
No, &emdash; and &endash; were not in HTML 3.2 (which had no entities
that would expand to em dash or en dash).
[color=blue]
> A reading of my query will show
> that a change in the specification was exactly what I was asking
> about.[/color]
There was no change in the specification in this issue.
Darin wrote that "HTML 3.0 (RIP) specified &endash; and &emdash;", but
I am unable to find such entities in the HTML 3.0 draft. Anyway, if
they were there, it would be rather old info - the HTML 3.0 draft
expired in 1995 - and it would not have been a change in any
specification, since HTML 3.0 was just an incomplete draft.
It is my understanding that &endash; and &emdash; were just some
browser's invention, years ago.
--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html | 
July 20th, 2005, 06:49 PM
| | | Re: Non-ASCII characters in web pages
Jukka K. Korpela <jkorpela@cs.tut.fi> wrote:[color=blue]
> Darin wrote that "HTML 3.0 (RIP) specified &endash; and &emdash;", but
> I am unable to find such entities in the HTML 3.0 draft.[/color]
They're mentioned here: http://www.w3.org/MarkUp/html3/specialchars.html
[color=blue]
> It is my understanding that &endash; and &emdash; were just some
> browser's invention, years ago.[/color]
The LaTeX manual refers to "endash" and "emdash" when describing the
characters generated from the input "--" and "---", so there was at least
some precedent.
--
Darin McGrew, mcgrew@stanfordalumni.org, http://www.rahul.net/mcgrew/
Web Design Group, darin@htmlhelp.com, http://www.HTMLHelp.com/
"Who is General Failure and why is he reading my hard disk?" | 
July 20th, 2005, 06:49 PM
| | | Re: Non-ASCII characters in web pages
On Wed, 25 Feb 2004 00:17:49 -0500, Stan Brown
<the_stan_brown@fastmail.fm> declared in
comp.infosystems. www.authoring.html:[color=blue]
> It seems "Andreas Prilop" wrote in
> comp.infosystems. www.authoring.html:[color=green]
>>
>>For example <http://www.unics.uni-hannover.de/nhtcapri/multilingual2.html>
>>[/color]
> There's no text after the <h2>Mathematical Symbols</h2>[/color]
No, it's a link to a separate page.
--
Mark Parnell http://www.clarkecomputers.com.au | 
July 20th, 2005, 06:49 PM
| | | Re: Non-ASCII characters in web pages
Darin McGrew <mcgrew@stanfordalumni.org> wrote:
[color=blue]
> Jukka K. Korpela <jkorpela@cs.tut.fi> wrote:[color=green]
>> Darin wrote that "HTML 3.0 (RIP) specified &endash; and &emdash;",
>> but I am unable to find such entities in the HTML 3.0 draft.[/color]
>
> They're mentioned here:
> http://www.w3.org/MarkUp/html3/specialchars.html[/color]
Thanks for the info. I had looked at the parts that specifically list
entities, and didn't find them there. It really was an incomplete
draft!
[color=blue]
> The LaTeX manual refers to "endash" and "emdash" when describing
> the characters generated from the input "--" and "---", so there
> was at least some precedent.[/color]
And the names themselves are natural.
But entity names in HTML have generally been taken from the entity sets
in appendix D of the SGML standard. The names that were actually taken
into HTML specifications follow this principle. This is why they look
so odd, half-mnemonic and irregular. (E.g., em dash is "—" but
em space is " ". Actually some people might say this is logical in
an odd way, since em space is by Unicode definition a space with a
fixed width of em, the font size, whereas em dash might vary in width,
though historically and in common font design it has the width of one
em.)
--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html | 
July 20th, 2005, 06:49 PM
| | | Re: Non-ASCII characters in web pages
On Wed, 25 Feb 2004, Stan Brown wrote:
[color=blue][color=green]
>> For example <http://www.unics.uni-hannover.de/nhtcapri/multilingual2.html>[/color]
>
> There's no text after the <h2>Mathematical Symbols</h2>[/color]
I wonder from where you got <h2>Mathematical Symbols</h2> ?
The source reads
<h2 class="noprint"><a href="mathematics.html">Mathematical symbols</a></h2> | 
July 20th, 2005, 06:49 PM
| | | Re: Non-ASCII characters in web pages
On Wed, 25 Feb 2004, Jukka K. Korpela wrote:
[color=blue]
> But entity names in HTML have generally been taken from the entity sets
> in appendix D of the SGML standard. The names that were actually taken
> into HTML specifications follow this principle. This is why they look
> so odd, half-mnemonic and irregular. (E.g., em dash is "—" but
> em space is " ".[/color]
My understanding was that the length of entities in SGML was limited to
5 letters, therefore ´ but ¨ and ˆ . Compositions with
base letters may have 6 letters, like é . | 
July 20th, 2005, 06:49 PM
| | | Re: Non-ASCII characters in web pages
Andreas Prilop <nhtcapri@rrzn-user.uni-hannover.de> wrote:
[color=blue]
> My understanding was that the length of entities in SGML was
> limited to 5 letters, therefore ´ but ¨ and ˆ .[/color]
The SGML reference concrete syntax sets NAMELEN to 8, so I don't think
that's the explanation. (HTML for example, when formally defined as an
SGML application, sets NAMELEN to 65536.)
And you actually present some counterevidence:
[color=blue]
> Compositions with base letters may have 6 letters, like é .[/color]
Besides, there's e.g. ·, which is in HTML too.
Actually they say in clause D.4.1.3 of the SGML standard:
"The entity names are derived from the English language.
They were chosen for a maximum mnemonic value, consistent
with the logical and systematic use of abbreviations."
(Sounds funny, doesn't it? Anyone who regards the entity names as
mnemonic has a memory that works in odd ways.) And:
"The entity names are limited to six characters in length - -".
So it's just an odd restriction they decided to impose, with no good
reason within SGML itself. We old-timers might compute 1+6+1 = 8 and
guess that the idea was that the entity reference as a whole, including
"&" and ";", would fit into a doubleword for efficiency.
--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html | 
July 20th, 2005, 06:49 PM
| | | Re: Non-ASCII characters in web pages
"Jukka K. Korpela" <jkorpela@cs.tut.fi> wrote:
[color=blue][color=green]
>> My understanding was that the length of entities in SGML was
>> limited to 5 letters, therefore ´ but ¨ and ˆ .[/color]
>
> And you actually present some counterevidence:[/color]
Our five letters are "a c u t e" and "e" .... our six letters are ....
I'll come in again.
You are right. Several of the Latin-1 entities show the abbreviation
with six letters:
¤ ¦ ± ½ ¿ | 
July 20th, 2005, 06:49 PM
| | | Re: Non-ASCII characters in web pages
It seems "Andreas Prilop" wrote in
comp.infosystems. www.authoring.html:[color=blue]
>On Wed, 25 Feb 2004, Stan Brown wrote:
>[color=green][color=darkred]
>>> For example <http://www.unics.uni-hannover.de/nhtcapri/multilingual2.html>[/color]
>>
>> There's no text after the <h2>Mathematical Symbols</h2>[/color][/color]
I guessed -- it was the same size as the other major headings on the
page. <h2> is the usual way to do a major heading.
[color=blue]
>I wonder from where you got <h2>Mathematical Symbols</h2> ?
>The source reads
><h2 class="noprint"><a href="mathematics.html">Mathematical symbols</a></h2>[/color]
I am sure you are not seriously suggesting that visitors should have
to view source of a page to determine whether something is a link!
I think the (unidentified) author of that page goofed. A heading
logically should be followed by text. If it's a link with no text
then it's not logically a heading -- and if it's a link it should
look like one.
--
Stan Brown, Oak Road Systems, Cortland County, New York, USA http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2 spec: http://www.w3.org/TR/REC-CSS2/
2.1 changes: http://www.w3.org/TR/CSS21/changes.html
validator: http://jigsaw.w3.org/css-validator/ | 
July 20th, 2005, 06:49 PM
| | | Re: Non-ASCII characters in web pages
On Wed, 25 Feb 2004 09:19:04 -0500, Stan Brown
<the_stan_brown@fastmail.fm> declared in
comp.infosystems. www.authoring.html:
[color=blue]
> Talk about non-obvious navigation![/color]
Took me a while to work it out. Sure, it was blue and the other headings
were black, but from memory the only reason I realised was because I
happened to run my mouse over it. :-S
--
Mark Parnell http://www.clarkecomputers.com.au | 
July 20th, 2005, 06:50 PM
| | | Re: Non-ASCII characters in web pages
It seems "Mark Parnell" wrote in
comp.infosystems. www.authoring.html:[color=blue]
>On Wed, 25 Feb 2004 09:19:04 -0500, Stan Brown
><the_stan_brown@fastmail.fm> declared in
>comp.infosystems. www.authoring.html:
>[color=green]
>> Talk about non-obvious navigation![/color]
>
>Took me a while to work it out. Sure, it was blue and the other headings
>were black, but from memory the only reason I realised was because I
>happened to run my mouse over it. :-S[/color]
If the page gave any indication of its authorship, I'd send the
author a note.
It never ceases to amaze me how many pages give absolutely no
contact information, not even a link to it. Sometimes by shortening
the URL one can stumble onto a home page, but lately that seems to
lead more often to a "not found" or "forbidden", so I usually don't
bother.
--
Stan Brown, Oak Road Systems, Cortland County, New York, USA http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2 spec: http://www.w3.org/TR/REC-CSS2/
2.1 changes: http://www.w3.org/TR/CSS21/changes.html
validator: http://jigsaw.w3.org/css-validator/ | 
July 20th, 2005, 06:50 PM
| | | Re: Non-ASCII characters in web pages
Stan Brown <the_stan_brown@fastmail.fm> wrote:
[color=blue][color=green]
>>I wonder from where you got <h2>Mathematical Symbols</h2> ?
>>The source reads
>><h2 class="noprint"><a href="mathematics.html">Mathematical
>>symbols</a></h2>[/color]
>
> I am sure you are not seriously suggesting that visitors should
> have to view source of a page to determine whether something is a
> link![/color]
I'm afraid that's what we may need to do at times, as users. :-(
[color=blue]
> I think the (unidentified) author of that page goofed. A heading
> logically should be followed by text. If it's a link with no text
> then it's not logically a heading -- and if it's a link it should
> look like one.[/color]
Well, yes. What I would do is to keep the heading, removing the link
markup there, and adding the following after it:
<div>See separate page <cite><a href="mathematics.html"
rel="next">Mathematical formulas in HTML 4.0</a></cite>.</div>
(probably wrapping this together with the heading inside
<div class="noprint">...</div>)
--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html |
Posting Rules
| You may not post new threads You may not post replies You may not post attachments You may not edit your posts HTML code is Off | | | | | | What is Bytes?
We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights.
Get the best answers to your questions from over network members.
|