On Sun, 7 Mar 2004, Hans Mabelis wrote:
I'm new here; got here because suddenly the question came up: is html a
7-bit or an 8-bit language?
No. Not since RFC2070 and HTML4.*
I seem to consistently suffer from character set issues.
That's a bit vague. Do you want to understand the underlying
principles (which is what I would recommend) or are you experiencing
specific problems (in which case you'd need to say a bit more about
what they are, and preferably put some of the problematic materials
online so that people can see for themselves what's going on).
Of course, I can specify a specific character set
Actually no. The Document Character Set is always iso-10646/unicode.
What you _can_ specify is the character encoding, which in MIME
terminology is confusingly called "charset". Until you understand the
difference, none of this stuff is likely to make much sense, I'm
afraid.
Some people have found the materials in my area
http://ppewww.ph.gla.ac.uk/~flavell/charset/ to be of use.
But RFC2070 itself isn't bad, even if it's somewhat dated. The
description of the character representation model in HTML/4.01 is also
reasonably clear. The hardest part is often un-learning things that
the student is convinced that they already understand.
good luck