On Sat, 17 Apr 2004, Ashmodai wrote:
http://www.santagata.us/characters/C...rEntities.html
after criticising the lack of precise version details, seems to
go and do the same:
The Arc (U+2312) is not supported by Gecko browsers,
What exactly do you mean by "not supported"? I'm looking at it right
now on Win Mozilla 1.6
nor is the large circle (U+25EF)
It's showing a circle, though it's not particularly large...
Presumably the presence or absence of these characters on our displays
depends on something else than the choice of browser/version, but
rather on the availability of fonts.
Anyhow, let's get back to the page itself.
I have a criticism of the cited page in its use of terminology. It
seems to consistently refer to "numeric character references" as
"numeric entities", and refers to collectively to (what are properly
called) "numeric character references" and "character entities" under
the major heading of "entities". This sloppy terminology is quite
widespread, and I wouldn't normally get worked-up about it in casual
usage, but in a situation like this where the distinction is rather
important, I really would rather see the terms used accurately.
The correct terminology is surely that given in
http://www.w3.org/TR/html4/charset.html#h-5.3
There are three ways to include characters in a document: the
character itself, and two kinds of "character reference".
These two kinds of "character reference" are:
- the numeric character reference
- the character entity (for those relatively few characters where
one is defined).
In a testing situation like this, it's essential to make it clear
which of the two is under discussion. By referring to both kinds
loosely as "entities", it causes confusion in the mind of those who
are familiar with the correct terminology, IMHO.
Of course the numeric character reference then comes in two flavours,
the decimal or the hexadecimal form. There's what seems to be a
pointless and incorrect distinction drawn between these in the
covering notes to the page:
<li>A numeric entity which is a decimal encoding of older Unicode
characters, [iso-8859-1].
<li>A numeric entity which is an 8-bit encoding (hex)
of present & future Unicode characters, [ISO10646].
There's no such distinction. Either a browser understands the
&#x...; form or it does not: earlier ones (e.g NN4.* versions) did
not. But they had already started to understand Unicode values in
decimal per RFC2070 at an earlier stage of development. What I'm
saying is that decimal references are BY NO MEANS limited to
iso-8859-1 characters, but can be used even more widely than the
hexadecimal form, as shown e.g in my rough-and-ready tables below
http://ppewww.ph.gla.ac.uk/~flavell/unicode/
My test pages of course are much less visually attractive than the
page under discussion, but they were made for a rather different
purpose.
And a word of warning about tests with IE: the results depend not only
on what fonts are available, but also on which language options have
been installed. For example, I found that installing Japanese
language support (which I didn't actually need because I can't read
it) nevertheless enabled a whole swath of useful symbols that hadn't
previously worked, and that had no obvious relationship to Japanese.
all the best