By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
449,015 Members | 990 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 449,015 IT Pros & Developers. It's quick & easy.

preferred charset?

P: n/a
I have been using the charset windows-1252 for a while, but it was
pointed out to someone else in this group recently that it's a
Microsoft creation (I'm sure I'm getting my facts wrong or skewed) and
therefore not good for cross-platform browsing.
Anyway, I am beginning my road to recovery (ie, breaking my addiction
to authoring only for IE) and I would like to know what is the
preferred charset?
I have tried a search and only find immense lists that make me
cross-eyed without ever telling me which to use to utilize a full
range of characters and have them display the way I intend on
English-speaking machines.
I'm not sure of the proper term, but I always use the & character
substitutes for anything that doesn't show up on my keyboard so,
ideally, the charset should display those, right? (For instance, if I
want to display Montréal, I would input Montréal.)
Thanks!
Jul 20 '05 #1
Share this Question
Share on Google+
22 Replies


P: n/a
Jane Withnolastname <Ja**********************@yahoo.com> wrote:
Anyway, I am beginning my road to recovery (ie, breaking my addiction
to authoring only for IE) and I would like to know what is the
preferred charset?
Probably UTF-8.

;K

Jul 20 '05 #2

P: n/a
On Wed, 27 Aug 2003 22:13:47 -0500, Mad Bad Rabbit
<ma**********@yahoo.com> wrote:
Jane Withnolastname <Ja**********************@yahoo.com> wrote:
Anyway, I am beginning my road to recovery (ie, breaking my addiction
to authoring only for IE) and I would like to know what is the
preferred charset?


Probably UTF-8.


I tried UTF-8 a while ago (thinking it was the right one) and went
back to the windows one because I got odd results from it. However,
now that you have suggested it, I have gone back to the file I had
problems with and see that I was using open and close quotes (instead
of the regular quotes on the keyboard). I don't know how they got in
there, because I really had to search to figure out how I got them, so
I must have copy&pasted it.
Anyway, here's a sorta related question: is it acceptable to write
ASCII codes into html? As in the above example, it would be “ for
the open quote, and ” for the close quote.
Is that acceptable, or is there another way, similar to the preferred
method of using &eacute; rather than é?
Or would I be better advised to stick with regular quotes and never
mind special ASCII-only characters?
And while we're on quotes ... is it acceptable to use the quote key to
put them in a file, or is it better to use &quot;?
Thanks again. I'm feeling quite stupid right now :)

P.S. Is there a list somewhere of all the alternate characters? I'd
try a search, but I don't know the proper term for these.
Jul 20 '05 #3

P: n/a
Jane Withnolastname <Ja**********************@yahoo.com> wrote:
Anyway, here's a sorta related question: is it acceptable to write
ASCII codes into html? As in the above example, it would be “ for
the open quote, and ” for the close quote.
Not if you're declaring the codeset to be Unicode.
“ isn't a double-quote unless you use codeset 1252.

To find a given character in Unicode, go to:

http://www.unicode.org/charts/

and look at (for example) "General Punctuation" chart.
In Unicode, the double-quotes you want are assigned
codes &#x201C; and &#x201D;
Is that acceptable, or is there another way, similar
to the preferred method of using &eacute; rather than é?
Good question: yes there are. See

http://www.htmlhelp.com/reference/ht...s/special.html

You can use entities &ldquo; and &rdquo; for these characters.
(which is a lot easier to remember than the numeric codes).
P.S. Is there a list somewhere of all the alternate characters? I'd
try a search, but I don't know the proper term for these.
They're called "HTML entities", and the above site has lists.
HTH
;K

Jul 20 '05 #4

P: n/a
Jane Withnolastname wrote:
I have been using the charset windows-1252 for a while, but it was
pointed out to someone else in this group recently that it's a
Microsoft creation (I'm sure I'm getting my facts wrong or skewed) and
therefore not good for cross-platform browsing.
Anyway, I am beginning my road to recovery (ie, breaking my addiction
to authoring only for IE) and I would like to know what is the
preferred charset?
I have tried a search and only find immense lists that make me
cross-eyed without ever telling me which to use to utilize a full
range of characters and have them display the way I intend on
English-speaking machines.
I'm not sure of the proper term, but I always use the & character
substitutes for anything that doesn't show up on my keyboard so,
ideally, the charset should display those, right? (For instance, if I
want to display Montréal, I would input Montr&eacute;al.)


I use ISO-8859-1 because it allows me to dispense with character
references like &eacute; the source readability is much better without
those codes.
Headless

--
Email and usenet filter list: http://www.headless.dna.ie/usenet.htm
Jul 20 '05 #5

P: n/a
Mad Bad Rabbit <ma**********@yahoo.com> wrote:
Anyway, here's a sorta related question: is it acceptable to write
ASCII codes into html? As in the above example, it would be “ for
the open quote, and ” for the close quote.


Not if you're declaring the codeset to be Unicode.
“ isn't a double-quote unless you use codeset 1252.


“ is undefined in HTML, no matter what you "declare" anywhere.
This has been discussed dozens of times.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 20 '05 #6

P: n/a
On Thu, Aug 28, Mad Bad Rabbit inscribed on the eternal scroll:
Jane Withnolastname <Ja**********************@yahoo.com> wrote:
Anyway, here's a sorta related question: is it acceptable to write
ASCII codes into html?
ASCII is a 7-bit code, and has displayable characters in the range 32
(space) to 126 inclusive. (x20 to x7e).
As in the above example, it would be “ for
the open quote, and ” for the close quote.

No.
Not if you're declaring the codeset to be Unicode.
It would be helpful if you'd refrain from offering answers until you
understand them.
“ isn't a double-quote unless you use codeset 1252.


&#number; notations in the range 127 to 159 inclusive are undefined in
HTML, and illegal in XHTML. No matter what this or that browser might
happen to display when presented with them.

I don't believe your term "codeset" means anything in HTML, SGML, XML
or XHTML. It seems to be some confused conflation of the terms
"character code" (or maybe "code page") and "character set". These
are distinct concepts in HTML/XHTML, and any attempt to muddle them up
is sure to be unhelpful.

have fun.
Jul 20 '05 #7

P: n/a
In article <in********************************@4ax.com> in
comp.infosystems.www.authoring.html, Jane Withnolastname
<Ja**********************@yahoo.com> wrote:
On Wed, 27 Aug 2003 22:13:47 -0500, Mad Bad Rabbit
<ma**********@yahoo.com> wrote:
Jane Withnolastname <Ja**********************@yahoo.com> wrote:
I would like to know what is the preferred charset?
Probably UTF-8.


Good advice. See
<http://ppewww.ph.gla.ac.uk/~flavell/charset/checklist.html>.
Anyway, here's a sorta related question: is it acceptable to write
ASCII codes into html?
Yes, though it's unnecessary excel;t for > and &.
As in the above example, it would be “ for
the open quote, and ” for the close quote.
No, those are not ASCII. They're not even Unicode; they are
Microsoft creations. Any reference between € and Ÿ
inclusive is wrong.

You can create an open quote in three legal ways:
" &ldquo; “
and a close quote in three legal ways:
" &rdquo; ”

The straight quote " works in all browsers without exception. If its
appearance is acceptable to you (and it should be, since a great
many Web sites use it), you need look no further.

If you really want curly quotes, use the "entities" or the numeric
references. Most browsers treat them exactly the same; a few (like
Netscape 4 if I recall correctly) will handle the numeric references
correctly but not the entitles.
Is that acceptable, or is there another way, similar to the preferred
method of using &eacute; rather than é?
How is that the "preferred method"? The two are the same. As far as
I know, browser support is the same.
Or would I be better advised to stick with regular quotes and never
mind special ASCII-only characters?
Yes, I think so, if by "regular quotes" you mean the standard
double-quote character on the keyboard. I don't know what you mean
by "ASCII-only characters".
And while we're on quotes ... is it acceptable to use the quote key to
put them in a file, or is it better to use &quot;?
There is no reason to use &quot; ever, that I am aware of.
P.S. Is there a list somewhere of all the alternate characters? I'd
try a search, but I don't know the proper term for these.


"Numeric character references", but here's a terrific list:

http://www.alanwood.net/demos/ent4_frame.html

For numbers up to 255, it shouldn't matter whether you use the
number or the entity. For higher numbers, some browsers do a better
job with the number than with the entity.

--
Stan Brown, Oak Road Systems, Cortland County, New York, USA
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2 spec: http://www.w3.org/TR/REC-CSS2/
validator: http://jigsaw.w3.org/css-validator/
Jul 20 '05 #8

P: n/a
Stan Brown <th************@fastmail.fm> writes:
There is no reason to use &quot; ever, that I am aware of.


<img src="quotechar.jpg" alt="&quot;">

--
Chris
Jul 20 '05 #9

P: n/a
In article <87************@dinopsis.dur.ac.uk> in
comp.infosystems.www.authoring.html, Chris Morris
<c.********@durham.ac.uk> wrote:
Stan Brown <th************@fastmail.fm> writes:
There is no reason to use &quot; ever, that I am aware of.


<img src="quotechar.jpg" alt="&quot;">


<img src="quotechar.jpg" alt='"'> -- even aside from the fact that
the example is extremely unlikely to occur in practice. :-)

"Single quote marks can be included within the attribute value when
the value is delimited by double quote marks, and vice versa."
http://www.w3.org/TR/html401/intro/sgmltut.html#h-3.2.2

--
Stan Brown, Oak Road Systems, Cortland County, New York, USA
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2 spec: http://www.w3.org/TR/REC-CSS2/
validator: http://jigsaw.w3.org/css-validator/
Jul 20 '05 #10

P: n/a
In article <MP************************@news.odyssey.net> in
comp.infosystems.www.authoring.html, Stan Brown
<th************@fastmail.fm> wrote:
In article <in********************************@4ax.com> in
comp.infosystems.www.authoring.html, Jane Withnolastname
Anyway, here's a sorta related question: is it acceptable to write
ASCII codes into html?


Yes, though it's unnecessary excel;t for > and &.


Hmm -- I'm not sure how that got mangled. It should have read
"except for < > and &."

ASCII codes run 0 to 127; of them numbers 32 to 126 are displayable
(though 32 "displays" as a space).

--
Stan Brown, Oak Road Systems, Cortland County, New York, USA
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2 spec: http://www.w3.org/TR/REC-CSS2/
validator: http://jigsaw.w3.org/css-validator/
Jul 20 '05 #11

P: n/a
Stan Brown <th************@fastmail.fm> writes:
In article <87************@dinopsis.dur.ac.uk> in
comp.infosystems.www.authoring.html, Chris Morris
<c.********@durham.ac.uk> wrote:
Stan Brown <th************@fastmail.fm> writes:
There is no reason to use &quot; ever, that I am aware of.
<img src="quotechar.jpg" alt="&quot;">


<img src="quotechar.jpg" alt='"'> -- even aside from the fact that
the example is extremely unlikely to occur in practice. :-)


<img src="quoteandapos.png" alt="&quot '">

Even more unlikely, yes, but user input could potentially contain
both. More realistically on the image:

<img src="quotation.png" alt="&quot;Quotation&quot; - John O'Name">
"Single quote marks can be included within the attribute value when
the value is delimited by double quote marks, and vice versa."
http://www.w3.org/TR/html401/intro/sgmltut.html#h-3.2.2


Doing both at once remains a bit more difficult.

--
Chris
Jul 20 '05 #12

P: n/a
On Thu, 28 Aug 2003, Stan Brown wrote:
<img src="quotechar.jpg" alt="&quot;">


<img src="quotechar.jpg" alt='"'> -- even aside from the fact that
the example is extremely unlikely to occur in practice. :-)


http://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html
http://www.cl.cam.ac.uk/~mgk25/ucs/apostrophe.html

Jul 20 '05 #13

P: n/a
On Thu, 28 Aug 2003, Jane Withnolastname wrote:
Anyway, I am beginning my road to recovery (ie, breaking my addiction
to authoring only for IE) and I would like to know what is the
preferred charset?

There is none. It depends on *your* special situation.
http://ppewww.ph.gla.ac.uk/~flavell/...checklist.html
Or would I be better advised to stick with regular quotes and never
mind special ASCII-only characters?
It might be preferable to use only ASCII quotes (" '). See
http://ppewww.ph.gla.ac.uk/~flavell/...cklist.html#s3
Thanks again. I'm feeling quite stupid right now :)
No, no. Perfectly valid questions.
Is there a list somewhere of all the alternate characters?


You probably don't need all. Take
http://www.unics.uni-hannover.de/nht...2.html#symbols
as a starting point.

Jul 20 '05 #14

P: n/a
On Thu, 28 Aug 2003 08:24:25 +0100, Headless <me@privacy.net> wrote:
Jane Withnolastname wrote:
I have been using the charset windows-1252 for a while, but it was
pointed out to someone else in this group recently that it's a
Microsoft creation (I'm sure I'm getting my facts wrong or skewed) and
therefore not good for cross-platform browsing.
Anyway, I am beginning my road to recovery (ie, breaking my addiction
to authoring only for IE) and I would like to know what is the
preferred charset?
I have tried a search and only find immense lists that make me
cross-eyed without ever telling me which to use to utilize a full
range of characters and have them display the way I intend on
English-speaking machines.
I'm not sure of the proper term, but I always use the & character
substitutes for anything that doesn't show up on my keyboard so,
ideally, the charset should display those, right? (For instance, if I
want to display Montréal, I would input Montr&eacute;al.)


I use ISO-8859-1 because it allows me to dispense with character
references like &eacute; the source readability is much better without
those codes.
Headless


So I've got one vote for utf-8 and one vote for iso-8859-1 and
everybody else just wants to argue about quotes, which was so not the
point, to begin with.
Can I get a consensus?
It depends on what I'm using it for? OK, it's a general-use site aimed
at an English-speaking audience that may, at some time or another,
need to use non-English characters, such as é or ç. I need it to
display on all browsers and would be nice (but not necessary) if it
was printable on most printers.

If I understand correctly, this ISO charset will allow me to simply
input é and it will display correctly in all browsers?

Someone questioned my saying that entity rather than number was the
preferred method. Well, it's what I read on this newsgroup only a few
days ago, when someone was asking about the Euro character. The person
had said that it was written with the numerical identifier and was
advised to change it to the entity.

I apologize for apparently having no idea that ASCII stopped at 127. I
learned everything I know about ASCII in high school, something like
15 years ago. Some of it may have been wrong and some may have meshed
with what I *thought* was fact.... Anyway, thanks for straightening me
out on that.

Thanks!
Jul 20 '05 #15

P: n/a
In article <28*************************@rrzn-user.uni-hannover.de>
in comp.infosystems.www.authoring.html, Andreas Prilop
<nh******@rrzn-user.uni-hannover.de> wrote:
Stan Brown <th************@fastmail.fm> wrote:
I didn't write the above -- in fact I disagree with it. PLEASE be
careful with attributions!


You quoted it. Therefore the line has an additional quote mark (>).
Some newsreaders and Google
http://groups.google.com/groups?th=375726c4206f6e49
even display different quoting levels in different colours.
This is an elementary fact of Usenet quoting.


It's an elementary fact of how Usenet quoting is _supposed_ to be.
So many people in fact screw up the quote widgets that the mere
presence or absence of an extra widget is no guide to who said what.

How would you like having elementary errors attributed to you --
especially given that those attributions stay around for all time in
the Google archives?

I must confess I am surprised to see you defending misquoting
someone. By your "logic", the psalmist who wrote "The fool says in
his heart, 'there is no god'" would not object if someone claimed
that he himself said "there is no god".

--
Stan Brown, Oak Road Systems, Cortland County, New York, USA
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2 spec: http://www.w3.org/TR/REC-CSS2/
validator: http://jigsaw.w3.org/css-validator/
Jul 20 '05 #16

P: n/a
In article <d2********************************@4ax.com> in
comp.infosystems.www.authoring.html, Jane Withnolastname
<Ja**********************@yahoo.com> wrote:
So I've got one vote for utf-8 and one vote for iso-8859-1 and
everybody else just wants to argue about quotes, which was so not the
point, to begin with.
Can I get a consensus?


I understand and sympathize with your wish to ask questions on (what
look like) small unrelated issues and get simple unambiguous
answers.

The problem is that things don't work that way. Your questions are
in fact related, and the right answer to them depends on other facts
which you have not told us. That is why some of us have posted
references by which you can educate yourself on these issues.

--
Stan Brown, Oak Road Systems, Cortland County, New York, USA
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2 spec: http://www.w3.org/TR/REC-CSS2/
validator: http://jigsaw.w3.org/css-validator/
Jul 20 '05 #17

P: n/a
Jane Withnolastname wrote:
If I understand correctly, this ISO charset will allow me to simply
input é and it will display correctly in all browsers?


There are several variables that go into choosing a charset and
associated choices, read the supplied references if you're interested.

If you're not particularly interested, ISO-8859-1should work fine,
screen and print, all browsers.
Headless

--
Email and usenet filter list: http://www.headless.dna.ie/usenet.htm
Jul 20 '05 #18

P: n/a
Jane Withnolastname wrote:
So I've got one vote for utf-8 and one vote for iso-8859-1 and
everybody else just wants to argue about quotes, which was so not the
point, to begin with.
Can I get a consensus?
If you're mostly going to need characters from Western European
languages, ISO-8859-1 is a reasonable choice; it would let you put
characters directly from the normal Windows character set (as usually
configured in the U.S. and Western Europe) as long as you avoided the
range from #128-#159, which are control characters not permitted in HTML
(even though the proprietary Windows character set has printable
characters in that range). Characters other than those in iso-8859-1
would have to be added via numeric references or entity names (as noted
elsewhere in this thread regarding "curly quotes"; this is also true of
characters from other languages such as Hebrew or Chinese).

UTF-8 would permit the direct inclusion of the full range of Unicode
characters, but would require you to use an editing program that knows
how to generate data in this encoding (which requires multiple bytes for
characters outside the US-ASCII 7-bit range). In a UTF-8 document, you
wouldn't be able to paste in a character such as é or ç directly unless
your editor converted it appropriately; the 8-bit ISO-8859-1 reference
wouldn't be valid. If you just used US-ASCII with all other characters
represented as numeric or entity references, that would be valid,
however, since the US-ASCII range is represented identically in
ISO-8859-1 and UTF-8.
Someone questioned my saying that entity rather than number was the
preferred method. Well, it's what I read on this newsgroup only a few
days ago, when someone was asking about the Euro character. The person
had said that it was written with the numerical identifier and was
advised to change it to the entity.
That's because that person was using an invalid numerical reference for
the Euro character; I think they were using the number of its position
in (some versions of) the proprietary Windows encoding, rather than its
proper Unicode number. The Euro character is especially problematic
because it was only added to character sets relatively recently compared
to other special characters, and hence is not in ISO-8859-1 or even in
early versions of the proprietary Windows character set, but is in one
of the character positions in the current Windows set that is actually a
control character in ISO-8859-1 and Unicode.
I apologize for apparently having no idea that ASCII stopped at 127. I
learned everything I know about ASCII in high school, something like
15 years ago. Some of it may have been wrong and some may have meshed
with what I *thought* was fact.... Anyway, thanks for straightening me
out on that.


More character set info:
http://webtips.dan.info/char.html
http://mailformat.dan.info/body/charsets.html

--
== Dan ==
Dan's Mail Format Site: http://mailformat.dan.info/
Dan's Web Tips: http://webtips.dan.info/
Dan's Domain Site: http://domains.dan.info/

Jul 20 '05 #19

P: n/a
Stan Brown wrote:
In article <28*************************@rrzn-user.uni-hannover.de>
in comp.infosystems.www.authoring.html, Andreas Prilop
<nh******@rrzn-user.uni-hannover.de> wrote:
Stan Brown <th************@fastmail.fm> wrote:

I didn't write the above -- in fact I disagree with it. PLEASE be
careful with attributions!


You quoted it. Therefore the line has an additional quote mark (>).
Some newsreaders and Google
http://groups.google.com/groups?th=375726c4206f6e49
even display different quoting levels in different colours.
This is an elementary fact of Usenet quoting.

It's an elementary fact of how Usenet quoting is _supposed_ to be.
So many people in fact screw up the quote widgets that the mere
presence or absence of an extra widget is no guide to who said what.


But how many layers of attributions should be left there? In long
threads, I sometimes see 4 or more layers of attributions at the top,
then various levels of quoting. It's too much for me to sort through.
I normally look only at the first attribution. And when I reply, I
generally trim the extra ones to keep the reply readable. (I left
there here out of deference to the immediate topic.)

--
Brian
follow the directions in my address to email me

Jul 20 '05 #20

P: n/a
Jane Withnolastname <Ja**********************@yahoo.com> wrote:
OK, it's a general-use site aimed
at an English-speaking audience that may, at some time or another,
need to use non-English characters, such as é or ç.
If I understand correctly, this ISO charset will allow me to simply
input é and it will display correctly in all browsers?
Yes - assuming you mean ISO-8859-1.
Someone questioned my saying that entity rather than number was the
preferred method.
You can write é or &eacute; or é .
http://ppewww.ph.gla.ac.uk/~flavell/...cklist.html#s2
Well, it's what I read on this newsgroup only a few
days ago, when someone was asking about the Euro character.
It's "euro", not "Euro" - same with "dollar" and "pound".
The person
had said that it was written with the numerical identifier and was
advised to change it to the entity.
Probably because he used the _incorrect_ reference € which is
in fact undefined. The euro sign is _not_ included in ISO-8859-1;
therefore the situation is a bit different than with é .
I apologize for apparently having no idea that ASCII stopped at 127.


Many people don't know this. Even the Google Directory (too bad!)
<http://directory.google.com/Top/Science/Reference/Standards/Individual_Standards/ISO_646/>
<http://dmoz.org/Science/Reference/Standards/Individual_Standards/ISO_646/>
lists one reference that is complete bullshit.

--
http://www.unics.uni-hannover.de/nhtcapri/plonk.txt
Jul 20 '05 #21

P: n/a
Brian wrote:
But how many layers of attributions should be left there? In long
threads, I sometimes see 4 or more layers of attributions at the top


Kindly note that I only just saw the "how to quote" thread in ciwa-
stylesheets. Had I known it was there, I would not have introduced it
here. Feel free to respond here, or not, as you see fit.

--
Brian
follow the directions in my address to email me

Jul 20 '05 #22

P: n/a
In article <tNK3b.222909$cF.73323@rwcrnsc53> in
comp.infosystems.www.authoring.html, Brian
<us*****@mangymutt.com.invalid-remove-this-part> wrote:
But how many layers of attributions should be left there?


Exactly as many as there are layers of quoted text, no more and no
less.

If you're getting too many layers of attributions, you're probably
quoting too much and should trim it down. (I don't mean you in
particular; it's the universal "you".)

--
Stan Brown, Oak Road Systems, Cortland County, New York, USA
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2 spec: http://www.w3.org/TR/REC-CSS2/
validator: http://jigsaw.w3.org/css-validator/
Jul 20 '05 #23

This discussion thread is closed

Replies have been disabled for this discussion.