number or name for special character

The Bicycling Guitarist

A browser conforming to HTML 4.0 is required to recognize &#number;
notations.
If I use XHTML 1.0 and charset UTF-8 though, does é have as much
support as é ?

Sometimes when I run the TIDY utility on my code, it replaces my character
notations with weird looking things I don't recognize. Also, when I
converted to UTF-8 from ISO-8859-1, I discovered many special characters
didn't make the transition well.

Thanks for all the help from the wonderful people in this group.

Jul 23 '05 #1

Subscribe Post Reply

4259

Mark Parnell

On Tue, 02 Nov 2004 01:36:55 GMT, The Bicycling Guitarist
<Ch***@TheBicyclingGuitarist.net> declared in
comp.infosystems.www.authoring.html:

If I use XHTML 1.0 and charset UTF-8 though, does é have as much
support as é ?
In general numeric entities have better support than their named
equivalents. Not sure about that one specifically.

Not entirely what you were asking, but this shows how well supported
each of the *numeric* entities are (a bit outdated now, though newer
versions of browsers would presumably support at least the same ones
that earlier versions did).
http://www.fjordaan.uklinux.net/enti...s_support.html
Sometimes when I run the TIDY utility on my code, it replaces my character
notations with weird looking things I don't recognize.
Examples (ideally before and after URLs)?
Also, when I
converted to UTF-8 from ISO-8859-1, I discovered many special characters
didn't make the transition well.

That's presumably because those characters are part of UTF-8 but not
ISO-8859-1.

--
Mark Parnell
http://www.clarkecomputers.com.au

Jul 23 '05 #2

Tim

On Tue, 02 Nov 2004 01:36:55 GMT,
"The Bicycling Guitarist" <Ch***@TheBicyclingGuitarist.net> posted:

A browser conforming to HTML 4.0 is required to recognize &#number;
notations.

If I use XHTML 1.0 and charset UTF-8 though, does é have as much
support as é ?
Quite probably, I haven't found a browser that didn't supported entities
for any of the characters that aren't too unusual. You're probably more
likely to strike problems with the browser trying to display something that
the current font is inadequate for than the browser not supporting the
character.

Personally, I prefer named entities than numerical references. If, for
some reason, my browser can't display some reference it's going to show the
code for what it can't do (*). I've got a fair guess at what I should have
seen if it writes é on the page, but I'd have to look up what é
referred to.

* Some browsers don't show details for what you're missing out on, they'll
just print a ? or a blank box. Very unhelpful...

There's a point of view that says to avoid one thing in particular, though:
The euro. With a recommendation to write the name, normally, rather than
try and use a symbol for it. (There isn't always a symbol for it on the
system, or it's not authored right and the browser, therefore, doesn't
display it. Also, not all countries use it - in Australia you'd get very
little comprehension about what a euro is.)
Sometimes when I run the TIDY utility on my code, it replaces my character
notations with weird looking things I don't recognize. Also, when I
converted to UTF-8 from ISO-8859-1, I discovered many special characters
didn't make the transition well.

Usually that's because you're not actually authoring in the encoding system
that you think you are. If you're editing in plain text editors on older
Windows systems, like Win98SE, you're probably better off telling tidy that
it's receiving win1252 encoding. For newer systems, or fancier editors, it
might be one of the UTF schemes, but perhaps not UTF-8 (Windows idea of
Unicode might be UTF16 or UTF8, depending on the application - but with no
indication of which it's using, you'll have to test things).

Say what system and software you're using, someone might be able to tell
you what it's doing, or let you know if it has any peculiar foibles.

--
If you insist on e-mailing me, use the reply-to address (it's real but
temporary). But please reply to the group, like you're supposed to.

This message was sent without a virus, please delete some files yourself.

Jul 23 '05 #3

Pierre Goiffon

"Tim" <ti*@mail.localhost.invalid> a écrit dans le message de
news:1a****************************@40tude.net

If I use XHTML 1.0 and charset UTF-8 though, does é have as
much support as é ?

If your page is really encoded in UTF-8, you shouldn't need to use entities
!
There's a point of view that says to avoid one thing in particular,
though: The euro.

I thought the support for the Euro symbol nowdays was really wide : Windows
and Macintosh at least support it for years... Are the computers outside the
Euro zone running so old software that they don't support the Euro symbol ?

Jul 23 '05 #4

Alan J. Flavell

On Tue, 2 Nov 2004, Mark Parnell wrote:

On Tue, 02 Nov 2004 01:36:55 GMT, The Bicycling Guitarist
<Ch***@TheBicyclingGuitarist.net> declared in
comp.infosystems.www.authoring.html:
If I use XHTML 1.0 and charset UTF-8 though, does é have as much
support as é ?

The answer to the question doesn't really depend on those
preconditions. Support for character entities and numeric references
doesn't somehow vary according to what version of (X)HTML you are
writing. No properly-written browser (i.e that excludes Netscape 4.*
versions) changes its support for &-notation depending on what
character encoding scheme ("charset") is in use.
In general numeric entities
....correctly known as "numeric character references"...
have better support than their named equivalents.

"In general", this is true, agreed. But for the Latin-1 entities
(that were defined in the appendix to RFC1866/HTML2.0), the coverage
of the two &-notations is by now essentially the same - and at least
the character entities have more mnemonic value.

But the same can not be said for most of the additional character
entities which were introduced in HTML4.

However, if you are -really- using utf-8 (instead of just pretending,
as suggested in
http://ppewww.ph.gla.ac.uk/~flavell/...cklist.html#s6 ), then
you don't need to use either of the &-notations (except of course for
HTML-significant characters "<" and "&").

I think you'll find the various options listed in that checklist are
compatible with reality. If not, then I'm keen to hear about it.

Jul 23 '05 #5

The Bicycling Guitarist

"Pierre Goiffon" <pg******@nowhere.invalid> wrote in message
news:41**********************@news.free.fr...

"Tim" <ti*@mail.localhost.invalid> a écrit dans le message de
news:1a****************************@40tude.net
If I use XHTML 1.0 and charset UTF-8 though, does é have as
much support as é ?

If your page is really encoded in UTF-8, you shouldn't need to use
entities
!

Enough said about the tech support at my host's i.s.p. I do include a meta
tag specifying UTF-8.

Is is bad though to put © (for example) in the code instead of the
copyright symbol character? I am especially curious about special characters
in meta tags such as descriptions. I think I've seen some of my descriptions
with the character entity rendered as text (i.e. spelling out the code for
the entity instead of rendering the specified character) by some search
engines or search engine simulators. Sorry I don't remember which ones. An
interesting one to check though would be "Shopzilla"
http://www.TheBicyclingGuitarist.net.../shopzilla.htm where I use Ö in meta
tags and body text.

separate thread issue: Should I wrap attributes in the code? Is there a
recommended length of a line of code at which I should wrap? a maximum
length? I was able to reduce file size by ten percent by eliminating some
whitespace in some files while still preserving some "pretty print" to make
editing easier. I have a lot of spaces to delete throughout my web site!

Chris Watson a.k.a. "The Bicycling Guitarist"
www.TheBicyclingGuitarist.net/

Jul 23 '05 #6

Andreas Prilop

On Tue, 2 Nov 2004, The Bicycling Guitarist wrote:

X-Newsreader: Microsoft Outlook Express 6.00.2900.2180
"Tim" <ti*@mail.localhost.invalid> a ?crit dans le message de

You need to select

Tools > Options > Send
Mail Sending Format > Plain Text Settings > Message format MIME
News Sending Format > Plain Text Settings > Message format MIME
Encode text using: None

to send special, non-ASCII characters.
Is is bad though to put © (for example) in the code instead of the
copyright symbol character?
No, why?
I am especially curious about special characters
in meta tags such as descriptions. I think I've seen some of my descriptions
with the character entity rendered as text (i.e. spelling out the code for
the entity instead of rendering the specified character) by some search
engines or search engine simulators.
AltaVista did this in the past - but no longer.
I have a lot of spaces to delete throughout my web site!

Use tabs instead of spaces.

--
Top-posting.
What's the most irritating thing on Usenet?

Jul 23 '05 #7

Jukka K. Korpela

"Alan J. Flavell" <fl*****@ph.gla.ac.uk> wrote:

But for the Latin-1 entities
(that were defined in the appendix to RFC1866/HTML2.0), the coverage
of the two &-notations is by now essentially the same

_Except_ when genuine XHTML is used. A non-validating XML parser is not
required to process an external subset, and technically the entities are
defined in an external subset. If you serve an XHTML document genuinely
as XHTML, i.e. with Content-Type: application/xhtml+xml, then a
conforming browser is not required to recognize predefined entity
references. And Opera indeed fails to recognize them; and it has been
reported that so does Safari.

Hence, there's little point in using entities for characters, if you use
XHTML.

Character references such as é work on all browsers.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 23 '05 #8

The Bicycling Guitarist

"Jukka K. Korpela" <jk******@cs.tut.fi> wrote in message
news:Xn*****************************@193.229.0.31. ..

"Alan J. Flavell" <fl*****@ph.gla.ac.uk> wrote:
But for the Latin-1 entities
(that were defined in the appendix to RFC1866/HTML2.0), the coverage
of the two &-notations is by now essentially the same

_Except_ when genuine XHTML is used. A non->
Hence, there's little point in using entities for characters, if you use
XHTML.

Character references such as é work on all browsers.

Should I convert all the " © Ö and so on throughout my site
into the &#number; form? What about &? Should it also be in a &#number;
form?

A man's got to know his limitations
(Dirty Harry)

Jul 23 '05 #9

Alan J. Flavell

On Tue, 2 Nov 2004, Jukka K. Korpela wrote:

"Alan J. Flavell" <fl*****@ph.gla.ac.uk> wrote:
But for the Latin-1 entities
(that were defined in the appendix to RFC1866/HTML2.0), the coverage
of the two &-notations is by now essentially the same
_Except_ when genuine XHTML is used.

OK...
A non-validating XML parser is not required to process an external
subset, and technically the entities are defined in an external
subset.
That's the theory, yes.

I'd remind anyone interested that the three forms of character
representation (&name; &#number; and the actual "coded character") are
genuine alternatives in HTML, and can be straightforwardly converted
into each other. Anyone who in future starts to encounter problems
with non-validating XML parsers should (assuming they have been
writing valid syntax) be able to pass their stuff through a trivial
convertor and have no worries at all about which form they prefer at
authoring time.
If you serve an XHTML document genuinely as XHTML, i.e. with
Content-Type: application/xhtml+xml, then a conforming browser is
not required to recognize predefined entity references. And Opera
indeed fails to recognize them; and it has been reported that so
does Safari.
AFAICS the only point in doing that is where you have a mind to *add*
something to HTML, such as SVG or mathML. It's utterly futile to go
writing XHTML/1.1 or later if in fact you're wanting nothing more than
what HTML/4.01 can provide.
Hence, there's little point in using entities for characters, if you use
XHTML.

Character references such as é work on all browsers.

I don't disagree. But those who are writing SVG and mathML and the
like, and expecting them to do something useful in the WWW context,
have quite a lot of other things to concern themselves with too.

But what's the question, *really* ?

* what should the author type (HTML notations, keyboard methods...) ?

* what processing should the editing software perform ?

and so on.

As an author, I'm not wanting to go typing in literally Ӓ (for
some value of 1234, which I'd have to learn for each character) for
every non-ascii character. It's a compromise.

Jul 23 '05 #10

Spartanicus

"Jukka K. Korpela" <jk******@cs.tut.fi> wrote:

as XHTML, i.e. with Content-Type: application/xhtml+xml, then a
conforming browser is not required to recognize predefined entity
references. And Opera indeed fails to recognize them;

Again: Opera recognizes character references in x(ht)ml mode since 7.5.

--
Spartanicus

Jul 23 '05 #11

Brian

The Bicycling Guitarist wrote:

"Jukka K. Korpela" wrote ...
_Except_ when genuine XHTML is used. Hence, there's little point in using entities for characters, if
you use XHTML.

Character references such as é work on all browsers.

Should I convert all the " © Ö and so on throughout my
site into the &#number; form?

No. Just use HTML 4.01 (strict).

--
Brian (remove "invalid" to email me)

Jul 23 '05 #12

Jan Roland Eriksson

On Tue, 2 Nov 2004 17:24:53 +0000 (UTC), "Jukka K. Korpela"
<jk******@cs.tut.fi> wrote:

[...]

Character references such as é work on all browsers.

Agreed; but it, sort of, defeats the idea of a "late binding" between
what gets put into a web page and how that thing gets handled by a UA.

--
Rex

Jul 23 '05 #13

The Bicycling Guitarist

"Jan Roland Eriksson" <jr****@newsguy.com> wrote in message
news:91********************************@4ax.com...

On Tue, 2 Nov 2004 17:24:53 +0000 (UTC), "Jukka K. Korpela"
<jk******@cs.tut.fi> wrote:

[...]
Character references such as é work on all browsers.

Agreed; but it, sort of, defeats the idea of a "late binding" between
what gets put into a web page and how that thing gets handled by a UA.

LOL. This morning I replaced most of the named entities in my web site by
their equivalent &#number; forms.

Two characters I am not certain of are the " and the & Should these
be replaced too, assuming I stick with xhtml instead of going back to html
4.01? I couldn't find their number forms in the table of characters I was
using.

Thanks to everyone, again. Feel free to point to my site as an example of a
personal web site whose author at least *tries* to conform to w3c
recommendations.

Chris Watson a.k.a. "The Bicycling Guitarist"
www.TheBicyclingGuitarist.net/

Jul 23 '05 #14

The Bicycling Guitarist

"The Bicycling Guitarist" <Ch***@TheBicyclingGuitarist.net> wrote in message
news:Mp*******************@newssvr14.news.prodigy. com...

Two characters I am not certain of are the " and the & Should
these be replaced too, assuming I stick with xhtml instead of going back
to html 4.01? I couldn't find their number forms in the table of
characters I was using.

I found " for " and & for &
I still don't know if I should replace these (or any of the) named forms
with the numbered forms. I really don't want to go back to a de facto
standard that was introduced in 1997, even though I was shocked to learn
that none of the entities named or numbered are recognized by xhtml 1.

Jul 23 '05 #15

Lauri Raittila

in comp.infosystems.www.authoring.html, The Bicycling Guitarist wrote:

LOL. This morning I replaced most of the named entities in my web site by
their equivalent &#number; forms.

Two characters I am not certain of are the " and the & Should these
be replaced too, assuming I stick with xhtml instead of going back to html
4.01? I couldn't find their number forms in the table of characters I was
using.
That is because maker of such table have not thought about need for such
for stuff in ASCII...

It seems that those are the about only ones that need to be changed in
XHMTL though. Others you can deal with by using UTF-8...
Thanks to everyone, again. Feel free to point to my site as an example of a
personal web site whose author at least *tries* to conform to w3c
recommendations.

The problem, as usual, is that you aim one purpose, and make no
compromises on it. That is hardly ever good idea.

If you used the time saved there to something else, you would get much
other stuff done... Myself I certainly couldn't think any better way to
spend 8 hours than reading/writing news and wathing telly and sufing net
today. But at least I did them simultaneously...

--
Lauri Raittila <http://www.iki.fi/lr> <http://www.iki.fi/zwak/fonts>

Jul 23 '05 #16

Tim

On Tue, 2 Nov 2004 15:37:41 +0100,
Andreas Prilop <nh******@rrzn-user.uni-hannover.de> posted:

X-Newsreader: Microsoft Outlook Express 6.00.2900.2180
"Tim" <ti*@mail.localhost.invalid> a ?crit dans le message de

You need to select

Tools > Options > Send
Mail Sending Format > Plain Text Settings > Message format MIME
News Sending Format > Plain Text Settings > Message format MIME
Encode text using: None

to send special, non-ASCII characters.

That does rather look like your telling me to do that (seeing as you've
left an attribution to me, but nothing that I've contributed to the thread,
and wrote it right after my name), but I don't use MSOE. You might want to
make that a bit clearer it was for Pierre's benefit.

--
If you insist on e-mailing me, use the reply-to address (it's real but
temporary). But please reply to the group, like you're supposed to.

This message was sent without a virus, please delete some files yourself.

Jul 23 '05 #17

Neal

The Bicycling Guitarist wrote:

I found " for " and & for &

Let me say - I feel for you, my man.

I was there not so long ago, figuring this out the hard way. It's the only
way to do it. We can tell you all day long, it takes your own discovery to
actually get it.

Keep at it. Don't give up. In time you'll be there.

Jul 23 '05 #18

The Bicycling Guitarist

"Neal" <ne*****@yahoo.com> wrote in message
news:op**************@news.individual.net...

The Bicycling Guitarist wrote:
I found " for " and & for &
way to do it. We can tell you all day long, it takes your own discovery to
actually get it.

Keep at it. Don't give up. In time you'll be there.

Where? lol. Thanks, Neal. btw I like the new look of that community music
page you do. I do listen to what I am told in these forums. More than once I
have laboriously reworked every page of my site because of the (usually)
sound advice I get here. I'd like to think my web site's code is getting
better.
The Bicycling Guitarist

Jul 23 '05 #19

Roland Eriksson

On Tue, 02 Nov 2004 23:06:20 GMT, "The Bicycling Guitarist"
<Ch***@TheBicyclingGuitarist.net> wrote:

"Jan Roland Eriksson" <jr****@newsguy.com> wrote in message
news:91********************************@4ax.com.. .
On Tue, 2 Nov 2004 17:24:53 +0000 (UTC), "Jukka K. Korpela"
<jk******@cs.tut.fi> wrote:
Character references such as é work on all browsers.
Agreed; but it, sort of, defeats the idea of a "late binding" between
what gets put into a web page and how that thing gets handled by a UA.

[...]Two characters I am not certain of are the " and the & Should these
be replaced too...

Well, theoretically...

If you use &quot you are effectively telling the UA that it should put
in a quote character as best it can, i.e. you are allowing the
"expansion" to come "late" in the rendering process and to become
whatever the browser can decide to be a defined content for a named
entity ".

If you use &#34 you are _ordering_ the UA to render an ASCII quote
character without any option to work from some other content mapping
procedure.

If you write e.g. "Simon&Garfuncle" directly in your HTML, a validator
will complain that there is no entity "Garfuncle" defined.

If you write "Simon & Garfuncle" directly in your HTML, you are all Ok
since an entity reference can not have whitespace in it and thus the
character '&' will be rendered litterally as it stands.

Maybe this page can give you some understanding of how named entity
references are supposed to work...

<http://css.nu/markup/markup-entities.html>

--
Rex

Jul 23 '05 #20

Pierre Goiffon

"Tim" <ti*@mail.localhost.invalid> a écrit dans le message de
news:gh****************************@40tude.net

"Tim" <ti*@mail.localhost.invalid> a ?crit dans le message de
You need to select

Tools > Options > Send (...) to send special, non-ASCII characters.

That does rather look like your telling me to do that (seeing as
you've left an attribution to me, but nothing that I've contributed
to the thread, and wrote it right after my name), but I don't use
MSOE. You might want to make that a bit clearer it was for Pierre's
benefit.

I use MSOE but it should be well configured, as I can see in my post header
:

MIME-Version: 1.0
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: 8bit

I think Andreas was referring to the Bicycling Guitarist post, witch
contains :

X-Newsreader: Microsoft Outlook Express 6.00.2900.2180

And no content type information at all.

Jul 23 '05 #21

Pierre Goiffon

"Andreas Prilop" <nh******@rrzn-user.uni-hannover.de> a écrit dans le
message de news:Pine.GSO.4.44.0411021534270.19029-100000@s5b004

Is is bad though to put © (for example) in the code instead of
the copyright symbol character?
No, why?

If the page is really encoded in UTF-8, why use an entity for the copyright
character, as it could be written "normally" (the character encoded in
UTF-8) ?

Jul 23 '05 #22

Pierre Goiffon

"Alan J. Flavell" <fl*****@ph.gla.ac.uk> a écrit dans le message de
news:Pi*****************************@ppepc56.ph.gl a.ac.uk

However, if you are -really- using utf-8 (instead of just pretending,
as suggested in
http://ppewww.ph.gla.ac.uk/~flavell/...cklist.html#s6 ), then
you don't need to use either of the &-notations (except of course for
HTML-significant characters "<" and "&").

I can't understand very well what lead to use this technique of pretending
using utf-8 but sending only us-ascii with entities for all characters
outside ascii. Can you please give us the reason why someone should use it ?

Jul 23 '05 #23

Alan J. Flavell

On Wed, 3 Nov 2004, Roland Eriksson wrote:

If you use &quot you are effectively telling the UA that it should
put in a quote character as best it can,
That may well be the SGML principle behind it, but the HTML
specification defines " to be identical to "

<!ENTITY quot CDATA """ -- double quote -->
i.e. you are allowing the "expansion" to come "late" in the
rendering process
Not in HTML, no. It's "late" in the SGML process, since it isn't
known until the DTD has been parsed; but thereafter there is no
freedom in HTML to redefine this entity, AFAICS.
Maybe this page can give you some understanding of how named entity
references are supposed to work...

<http://css.nu/markup/markup-entities.html>

Yes, in SGML; but, as you say yourself on that page, HTML rendering
agents don't implement this part of SGML. And character entity
references are already tied down in the HTML DTD. There's nothing in
the HTML specifications to give you authority to repurpose them (e.g
as smart quotes) at your whim.

Jul 23 '05 #24

Alan J. Flavell

On Wed, 3 Nov 2004, Pierre Goiffon wrote:

http://ppewww.ph.gla.ac.uk/~flavell/...cklist.html#s6 ), then
you don't need to use either of the &-notations (except of course for
HTML-significant characters "<" and "&").
I can't understand very well what lead to use this technique of pretending
using utf-8 but sending only us-ascii with entities for all characters
outside ascii.

The reason was, purely and simply, Netscape 4. I thought that was
presented in the notes to the cited page?

1. It does no harm to anything else that can otherwise handle
a wide character repertoire (let's exclude IE3 by now, OK?)

2. It doesn't require the author to master techniques of handling
utf-8 encoding

3. It's the only practical method (aside from genuinely using
utf-8 encoding) of persuading NN4.* versions to display a wide
character repertoire.
Can you please give us the reason why someone should use it ?

That's it. No more, no less.

If you can handle utf-8 encoding, then just go right ahead and use it.
No objections from me.

Jul 23 '05 #25

Stan Brown

"The Bicycling Guitarist" <Ch***@TheBicyclingGuitarist.net> wrote in
comp.infosystems.www.authoring.html:

LOL. This morning I replaced most of the named entities in my web site by
their equivalent &#number; forms.

Despite the advice telling you not to, when you asked that very
question.

--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2.1 spec: http://www.w3.org/TR/CSS21/
validator: http://jigsaw.w3.org/css-validator/

Jul 23 '05 #26

Jukka K. Korpela

"Alan J. Flavell" <fl*****@ph.gla.ac.uk> wrote:

On Wed, 3 Nov 2004, Roland Eriksson wrote:
If you use &quot you are effectively telling the UA that it should
put in a quote character as best it can,
That may well be the SGML principle behind it,

Not in the sense Roland thinks about, I'm afraid.

The general SGML idea is explained in the SGML Handbook (p. 504) using
the example of frac78, which logically indicates a fraction 7/8 but could
be defined, depending on the output device and software used for
formatting, as just "7/8", or as a special character depicting the
fraction as a single symbol, or as a sequence of instructions to a
formatter. All this is lost in HTML, rather naturally, since HTML is for
use on the Web, where the characteristics of output devices are unknown
to authors.

The quotation mark is a different monster. Although the idea of using
" so that it is rendered as a curly ("smart") quotation mark when
possible, falling back to straight Ascii quote when not, would work well
for Swedish or Finnish, it would not work for the languages that use
asymmetric quotation marks (e.g., English). So even in the SGML context,
the idea wouldn't work well. The idea of using an element, like
<q>...</q>, for quotations would work much better, but that's a different
story.
but the HTML
specification defines " to be identical to "

Indeed. And there's virtually never any need to use ". In the rare
case where you would need it, namely inside an attribute value delimited
by quotation marks, you can alternatively avoid the problem by using
apostrophes as delimiters. The day you invent an attribute value that
contains _both_ " _and_ ', you have my permission to use " or ".
(Actually the latter is shorter and easier to get right - after all, we
all know the Ascii table by heart even when asleep, but we might or might
not remember whether it's " or the more logical &quote;. :-))

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 23 '05 #27

Andreas Prilop

On Wed, 3 Nov 2004, Pierre Goiffon wrote:

Is is bad though to put © (for example) in the code instead of ^^^ the copyright symbol character?

No, why?

If the page is really encoded in UTF-8, why use an entity for the copyright
character, as it could be written "normally" (the character encoded in
UTF-8) ?

The question was whether it is "bad", not whether it is necessary.
I don't see anything "bad" in writing © .

--
Top-posting.
What's the most irritating thing on Usenet?

Jul 23 '05 #28

Andreas Prilop

On Wed, 3 Nov 2004, Pierre Goiffon wrote:

http://ppewww.ph.gla.ac.uk/~flavell/...cklist.html#s6

I can't understand very well what lead to use this technique of pretending
using utf-8 but sending only us-ascii with entities for all characters
outside ascii. Can you please give us the reason why someone should use it ?

(1) You can better understand the examples on
http://ppewww.ph.gla.ac.uk/~flavell/...direction.html
http://ppewww.ph.gla.ac.uk/~flavell/...ir-sample.html
http://www.unics.uni-hannover.de/nht...mazel-tov.html
with source in &#number; than with source in "pure UTF-8".

(2) I can use one and the same document sent in different encodings
http://www.unics.uni-hannover.de/nht...-alphabet.html
http://www.unics.uni-hannover.de/nht...alphabet.html6
Depending on operating system/browser/fonts, they may look quite
differently.

--
Top-posting.
What's the most irritating thing on Usenet?

Jul 23 '05 #29

Andreas Prilop

On Wed, 3 Nov 2004, Tim wrote:

MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
You need to select

Tools > Options > Send
Mail Sending Format > Plain Text Settings > Message format MIME
News Sending Format > Plain Text Settings > Message format MIME
Encode text using: None

to send special, non-ASCII characters.

That does rather look like your telling me to do that

No, I was telling Bicycling Guitarist to set up his newsreader^W
Outlook Express correctly. You do have a Content-Type header with
charset declaration.

Jul 23 '05 #30

Andreas Prilop

On Wed, 3 Nov 2004, Jukka K. Korpela wrote:

The day you invent an attribute value that
contains _both_ " _and_ ', you have my permission to use " or ".

One might want to have 57° 17' 45" as attribute value because
' " are more reliable than ′ and ″ .

--
Top-posting.
What's the most irritating thing on Usenet?

Jul 23 '05 #31

Pierre Goiffon

"Jukka K. Korpela" <jk******@cs.tut.fi> a écrit dans le message de
news:Xn*****************************@193.229.0.31

And there's virtually never any need to use ". In the
rare case where you would need it, namely inside an attribute value
delimited by quotation marks, you can alternatively avoid the problem
by using apostrophes as delimiters.

When you have to edit text containing both ' and " in a CMS that present
this in an input field, replacing " by ' is not an acceptable solution. And
this happens really often.

Jul 23 '05 #32

Pierre Goiffon

"Andreas Prilop" <nh******@rrzn-user.uni-hannover.de> a écrit dans le
message de news:Pine.GSO.4.44.0411031646520.6307-100000@s5b004

Is is bad though to put © (for example) in the code instead of ^^^ the copyright symbol character? No, why?
If the page is really encoded in UTF-8, why use an entity for the
copyright character, as it could be written "normally" (the
character encoded in UTF-8) ?
The question was whether it is "bad", not whether it is necessary.
I don't see anything "bad" in writing © .

It could be bad for the folks that will have to edit the text... and give an
extra care to be sure to replace any non ascii character by their refering
entities :) But OK, I understand your answer now

Jul 23 '05 #33

Brian

The Bicycling Guitarist wrote:

Two characters I am not certain of are the " and the &
Should these be replaced too, assuming I stick with xhtml instead of
going back to html 4.01?

Why did you switch to xhtml? What does xhtml offer that html 4.01 does not?

--
Brian (remove "invalid" to email me)

Jul 23 '05 #34

Brian

Jukka K. Korpela wrote:

The day you invent an attribute value that contains _both_ " _and_ ',
you have my permission to use " or ".

Does it have to be a *good* example? ;-)

<img src="foo.png" alt="Joe's "hands"">

--
Brian (remove "invalid" to email me)

Jul 23 '05 #35

Neal

On Wed, 3 Nov 2004 15:09:00 +0000 (UTC), Jukka K. Korpela
<jk******@cs.tut.fi> wrote:

Indeed. And there's virtually never any need to use ". In the rare
case where you would need it, namely inside an attribute value delimited
by quotation marks, you can alternatively avoid the problem by using
apostrophes as delimiters. The day you invent an attribute value that
contains _both_ " _and_ ', you have my permission to use " or ".

How would you code this (it is currently incorrect)?

<img ... alt="Beethoven's handwritten comments in his manuscript of the
"Ode To Joy" section of his 9th symphony read, "Freunde verhindert Freunde
mit Rahmen."">

Jul 23 '05 #36

The Bicycling Guitarist

"Brian" <us*****@julietremblay.com.invalid> wrote in message
news:HA*******************@bgtnsc05-news.ops.worldnet.att.net...

The Bicycling Guitarist wrote:
Why did you switch to xhtml? What does xhtml offer that html 4.01 does
not?

I bought the propaganda of writing my code so it is readable by more user
agents and easier to convert to pure xml later. Besides, the X in front of
the HTML looks cool, like a space plane or something, lol.
The Bicycling Guitarist
www.TheBicyclingGuitarist/

Hey, I found I'm getting listed more in search engines the more I post and
give my URL. :-)

Jul 23 '05 #37

The Bicycling Guitarist

"Andreas Prilop" <nh******@rrzn-user.uni-hannover.de> wrote in message
news:Pine.GSO.4.44.0411031726500.6504-100000@s5b004...

On Wed, 3 Nov 2004, Tim wrote:
News Sending Format > Plain Text Settings > Message format MIME
Encode text using: None

No, I was telling Bicycling Guitarist to set up his newsreader^W
Outlook Express correctly. You do have a Content-Type header with
charset declaration.

Oops. My news sending format was NOT set up as you recommended. Thank you
Andreas Prilop.

Jul 23 '05 #38

Tim

On Wed, 3 Nov 2004 11:45:34 +0100,
"Pierre Goiffon" <pg******@nowhere.invalid> posted:

If the page is really encoded in UTF-8, why use an entity for the copyright
character, as it could be written "normally" (the character encoded in
UTF-8) ?

The thing that springs to mind is how difficult it can be to type the
symbol, directly. I haven't seen a keyboard with it, some computer have
cryptic keyboard shortcuts to print them (which I usually have to look up
on some chart, so it's not a "shortcut), and by the time you've waded your
way through some keyboard/font utility, it's probably easier to just type
"©" into your page.

Probably the best, most user-friendly, option is an editor which lets you
program it to automatically be able to convert (C) to the copyright symbol,
but hopefully ignore (c) - so that you can type points into sentences.

e.g. Does this (a) work nicely, (b) work badly, (c) not work?

--
If you insist on e-mailing me, use the reply-to address (it's real but
temporary). But please reply to the group, like you're supposed to.

This message was sent without a virus, please delete some files yourself.

Jul 23 '05 #39

The Bicycling Guitarist

"Stan Brown" <th************@fastmail.fm> wrote in message
news:MP***********************@news.odyssey.net...

"The Bicycling Guitarist" <Ch***@TheBicyclingGuitarist.net> wrote in
comp.infosystems.www.authoring.html:
LOL. This morning I replaced most of the named entities in my web site by
their equivalent &#number; forms.

Despite the advice telling you not to, when you asked that very
question.

But Stan, I was told that the number forms had wider support than the named
forms. My head hurts.

Jul 23 '05 #40

Lachlan Hunt

Tim wrote:

...it's probably easier to just type "©" into your page.

Probably the best, most user-friendly, option is an editor which lets you
program it to automatically be able to convert (C) to the copyright symbol,
but hopefully ignore (c) - so that you can type points into sentences.

That would be good, and is actually the kind of thing that MS Word does,
although it's not even close to being an HTML editor (despite any claims
from M$ stating otherwise). However, another good method would be to
convert any charater reference typed (such as ©, ©, etc.) to
the appropraite character, just as you suggested (C) would be converted.
ie. The file is saved with the actual copyright symbol, rather than
the character reference. Personally, I find it easier to remember the
few numerical keyboard shortcuts that I use most often (the windows-1252
code point), and just enter a copyright symbol Â© by typing Alt+0169.

For any characters that aren't available in Windows-1252, then i just
generate them using these unicode tools [1], which I've made available
as part of my copy of the rescured devedge sidebar [2], and then copy
and paste to the editor.

[1] http://lachy.id.au/dev/mozilla/sideb...haracter-tools
[2] http://lachy.id.au/blogs/log/2004/10/devedge-sidebar

--
Lachlan Hunt
http://lachy.id.au/
http://GetFirefox.com/ Rediscover the Web
http://SpreadFirefox.com/ Igniting the Web

Jul 23 '05 #41

Pierre Goiffon

"The Bicycling Guitarist" <Ch***@TheBicyclingGuitarist.net> a écrit
dans le message de news:Zw****************@newssvr13.news.prodigy.com

Despite the advice telling you not to, when you asked that very
question.
But Stan, I was told that the number forms had wider support than the
named forms. My head hurts.

You were also told many times that writing entities instead of the "real"
characters is useless :)

Jul 23 '05 #42

Pierre Goiffon

"Tim" <ti*@mail.localhost.invalid> a écrit dans le message de
news:hr***************************@40tude.net

If the page is really encoded in UTF-8, why use an entity for the
copyright character, as it could be written "normally"
The thing that springs to mind is how difficult it can be to type the
symbol, directly.

For Windows Microsoft provides a very practical tool : the Microsoft
keyboard layout generator
(http://www.microsoft.com/globaldev/tools/msklc.mspx). It can be used to
remap almost any keyboard keys ! And it's really easy to do.

For french speaking people, a very good layout is the one by Denis Liegeois
: http://home-14.tiscali-business.nl/~...35/kbdfrac.htm. I'm using it
daily, and it allows to type some of these characters for exemple : ¥, T, ©,
®, ±, ½, ¾, ß, æ, «, ", <, ', ...

There must be these kind of tools for others OS.

Jul 23 '05 #43

Pierre Goiffon

"Alan J. Flavell" <fl*****@ph.gla.ac.uk> a écrit dans le message de
news:Pi******************************@ppepc56.ph.g la.ac.uk

I can't understand very well what lead to use this technique of
pretending using utf-8 but sending only us-ascii with entities for
all characters outside ascii.
The reason was, purely and simply, Netscape 4.

Oh, OK. I wasn't sure reading your web site - there's a little reference
about it in "I18n Quickstart" but I xasn't sure to have understand it very
well.

Thanks for the answer :)

Jul 23 '05 #44

Pierre Goiffon

"Andreas Prilop" <nh******@rrzn-user.uni-hannover.de> a écrit dans le
message de news:Pine.GSO.4.44.0411031659530.6504-100000@s5b004

Hello Andreas, thanks for your answer but I didn't understand it :

(1) You can better understand the examples on
http://ppewww.ph.gla.ac.uk/~flavell/...direction.html
http://ppewww.ph.gla.ac.uk/~flavell/...ir-sample.html
http://www.unics.uni-hannover.de/nht...mazel-tov.html
with source in &#number; than with source in "pure UTF-8".
You mean for exemple that it's easier for a non french speaking to
understand œ than o ? (a better exemple would have be with a non
latin character but I can't find one easily)
(2) I can use one and the same document sent in different encodings
http://www.unics.uni-hannover.de/nht...-alphabet.html
http://www.unics.uni-hannover.de/nht...alphabet.html6
Depending on operating system/browser/fonts, they may look quite
differently.

In these 2 pages all the arabic characters are written using entities, but
one page is sent with utf-8 and the other with iso 8859-6. To what
differencies exactly are you refering ?

Jul 23 '05 #45

Pierre Goiffon

[1] http://lachy.id.au/dev/mozilla/sideb...haracter-tools
Very good tool. Thanks very mutch for the URL !
[2] http://lachy.id.au/blogs/log/2004/10/devedge-sidebar

I heard the Mozilla fundation could host DevEdge (there was a MozillaZine
post about it a few weeks ago), is there any news about that ?

Jul 23 '05 #46

Lachlan Hunt

Pierre Goiffon wrote:

"Lachlan Hunt" <sp***********@gmail.com> a Ã©crit dans le message de
news:41**************@gmail.com
[2] http://lachy.id.au/blogs/log/2004/10/devedge-sidebar

I heard the Mozilla fundation could host DevEdge (there was a MozillaZine
post about it a few weeks ago), is there any news about that ?

AFAIK, the plan is to licence all the old devedge content from AOL, and
I believe it will be hosted on mozilla.org or some related site. I have
no idea how long that is going to take, all I know is what I've read on
various blogs and mozillazine.

That sidebar will be permanently hosted there so you can keep using it,
or just save all the files and use it locally if you like. I'll keep
adding tools to it whenever I, or anyone else creates them.

--
Lachlan Hunt
http://lachy.id.au/
http://GetFirefox.com/ Rediscover the Web
http://SpreadFirefox.com/ Igniting the Web

Jul 23 '05 #47

Tim

"Tim" <ti*@mail.localhost.invalid>

The thing that springs to mind is how difficult it can be to type the
symbol, directly.

"Pierre Goiffon" <pg******@nowhere.invalid> posted:
For Windows Microsoft provides a very practical tool : the Microsoft
keyboard layout generator
(http://www.microsoft.com/globaldev/tools/msklc.mspx). It can be used to
remap almost any keyboard keys ! And it's really easy to do.

Sounded promising until I saw this on the page: The requirements:

* Windows 2000, Windows XP, or Windows Server 2003 (MSKLC will not run
on Windows 95, Windows 98, Windows ME or Windows NT4).
* Microsoft .NET Framework v1.0 or v1.1 must be installed.

Probably easier just to use an editor that lets me program what the F keys
do.

--
If you insist on e-mailing me, use the reply-to address (it's real but
temporary). But please reply to the group, like you're supposed to.

This message was sent without a virus, please delete some files yourself.

Jul 23 '05 #48

Pierre Goiffon

That sidebar will be permanently hosted there so you can keep using
it, or just save all the files and use it locally if you like. I'll
keep adding tools to it whenever I, or anyone else creates them.

Super !
Thanks a lot :)))

Jul 23 '05 #49

Pierre Goiffon

"Tim" <ti*@mail.localhost.invalid> a écrit dans le message de
news:1h******************************@40tude.net

For Windows Microsoft provides a very practical tool : the Microsoft
keyboard layout generator
Sounded promising until I saw this on the page: The requirements:

* Windows 2000, Windows XP, or Windows Server 2003 (MSKLC will not run
on Windows 95, Windows 98, Windows ME or Windows NT4).
* Microsoft .NET Framework v1.0 or v1.1 must be installed.

Probably easier just to use an editor that lets me program what the F
keys do.

Do as you like, but you won't be able to define such thing :
http://home-14.tiscali-business.nl/~fbou2235/frac1.gif
In a few clicks as you can do with the MSKLC

Jul 23 '05 #50

number or name for special character

Similar topics