Connecting Tech Pros Worldwide Forums | Help | Site Map

Character set

Hans Mabelis
Guest
 
Posts: n/a
#1: Jul 20 '05
I'm new here; got here because suddenly the question came up: is html a
7-bit or an 8-bit language? Officially, I mean.
I seem to consistently suffer from character set issues. Of course, I can
specify a specific character set - but that doesn't guarantee the receiving
computer will have that set on board.
Can anyone tell me more? Where to find guidelines, and real-world info?

Hans



Jukka K. Korpela
Guest
 
Posts: n/a
#2: Jul 20 '05

re: Character set


"Hans Mabelis" <hans@mabelis.nl> wrote:
[color=blue]
> I'm new here;[/color]

Checking the FAQ is advisable then. It's a bit dusty, but checking it
is better than starting from scratch in every thread. You might start
from http://www.htmlhelp.com/faq/html/bas...l#special-char
[color=blue]
> is html a 7-bit or an 8-bit language?[/color]

Yes. And no. You can use a 7-bit encoding, or an 8-bit encoding, or any
other encoding for an HTML document.
[color=blue]
> I seem to consistently suffer from character set issues.[/color]

Then please specify them, with URLs, after checking the basic
resources.
[color=blue]
> Of course, I can specify a specific character set[/color]

I'm afraid that could mean rather different things,
[color=blue]
>- but that doesn't guarantee
> the receiving computer will have that set on board.[/color]

Indeed. The safest bet in practice is Ascii. The second-safest in
theory (and pretty much in practice too, in worldwide considerations)
is UTF-8, if you know how to produce and announce it. But I'm not sure
whether you mean character encoding, character repertoire, or font.
Three different beasts.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Alan J. Flavell
Guest
 
Posts: n/a
#3: Jul 20 '05

re: Character set


On Sun, 7 Mar 2004, Hans Mabelis wrote:
[color=blue]
> I'm new here; got here because suddenly the question came up: is html a
> 7-bit or an 8-bit language?[/color]

No. Not since RFC2070 and HTML4.*
[color=blue]
> I seem to consistently suffer from character set issues.[/color]

That's a bit vague. Do you want to understand the underlying
principles (which is what I would recommend) or are you experiencing
specific problems (in which case you'd need to say a bit more about
what they are, and preferably put some of the problematic materials
online so that people can see for themselves what's going on).
[color=blue]
> Of course, I can specify a specific character set[/color]

Actually no. The Document Character Set is always iso-10646/unicode.
What you _can_ specify is the character encoding, which in MIME
terminology is confusingly called "charset". Until you understand the
difference, none of this stuff is likely to make much sense, I'm
afraid.

Some people have found the materials in my area
http://ppewww.ph.gla.ac.uk/~flavell/charset/ to be of use.

But RFC2070 itself isn't bad, even if it's somewhat dated. The
description of the character representation model in HTML/4.01 is also
reasonably clear. The hardest part is often un-learning things that
the student is convinced that they already understand.

good luck
Closed Thread