By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
454,620 Members | 982 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 454,620 IT Pros & Developers. It's quick & easy.

Character set

P: n/a
I'm new here; got here because suddenly the question came up: is html a
7-bit or an 8-bit language? Officially, I mean.
I seem to consistently suffer from character set issues. Of course, I can
specify a specific character set - but that doesn't guarantee the receiving
computer will have that set on board.
Can anyone tell me more? Where to find guidelines, and real-world info?

Hans
Jul 20 '05 #1
Share this Question
Share on Google+
2 Replies


P: n/a
"Hans Mabelis" <ha**@mabelis.nl> wrote:
I'm new here;
Checking the FAQ is advisable then. It's a bit dusty, but checking it
is better than starting from scratch in every thread. You might start
from http://www.htmlhelp.com/faq/html/bas...l#special-char
is html a 7-bit or an 8-bit language?
Yes. And no. You can use a 7-bit encoding, or an 8-bit encoding, or any
other encoding for an HTML document.
I seem to consistently suffer from character set issues.
Then please specify them, with URLs, after checking the basic
resources.
Of course, I can specify a specific character set
I'm afraid that could mean rather different things,
- but that doesn't guarantee
the receiving computer will have that set on board.


Indeed. The safest bet in practice is Ascii. The second-safest in
theory (and pretty much in practice too, in worldwide considerations)
is UTF-8, if you know how to produce and announce it. But I'm not sure
whether you mean character encoding, character repertoire, or font.
Three different beasts.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 20 '05 #2

P: n/a
On Sun, 7 Mar 2004, Hans Mabelis wrote:
I'm new here; got here because suddenly the question came up: is html a
7-bit or an 8-bit language?
No. Not since RFC2070 and HTML4.*
I seem to consistently suffer from character set issues.
That's a bit vague. Do you want to understand the underlying
principles (which is what I would recommend) or are you experiencing
specific problems (in which case you'd need to say a bit more about
what they are, and preferably put some of the problematic materials
online so that people can see for themselves what's going on).
Of course, I can specify a specific character set


Actually no. The Document Character Set is always iso-10646/unicode.
What you _can_ specify is the character encoding, which in MIME
terminology is confusingly called "charset". Until you understand the
difference, none of this stuff is likely to make much sense, I'm
afraid.

Some people have found the materials in my area
http://ppewww.ph.gla.ac.uk/~flavell/charset/ to be of use.

But RFC2070 itself isn't bad, even if it's somewhat dated. The
description of the character representation model in HTML/4.01 is also
reasonably clear. The hardest part is often un-learning things that
the student is convinced that they already understand.

good luck
Jul 20 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.