By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,846 Members | 1,872 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,846 IT Pros & Developers. It's quick & easy.

UTF-8 and Latin-1 characters

P: n/a
Since I am Swedish, I write website content mostly in Swedish language
and using charset iso-8859-1. I have (just for testing) tried to use
utf-8 on a test page ( http://w1.978.telia.com/~u97802964/test.html )
but the special Swedish characters don't come out right if I dont use
entities for them.

The Swedish characters in question is:
Latin letter a with ring above = å (Ś)
Latin letter a with diaeresis = ä (š)
Latin letter o with diaeresis = ö (Ų)

I realize I can use the entities, but I found a page with Swedish
content ( http://w1.318.comhem.se/~u31827122/scsiguide.html ) where
utf-8 is used and the characters come out right even without the use
of entities.

So my question to anybody who can give an answer, is how it comes that
I fail and somebody else can do it? I can't see anything different in
this other page that could make it possible :(

--
/Arne
Jul 20 '05 #1
Share this Question
Share on Google+
10 Replies


P: n/a
Arne wrote:
Since I am Swedish, I write website content mostly in Swedish language
and using charset iso-8859-1. I have (just for testing) tried to use
utf-8 on a test page ( http://w1.978.telia.com/~u97802964/test.html )
but the special Swedish characters don't come out right if I dont use
entities for them.
To use UTF-8, you need to tell your editor to actually save the file
as UTF-8. Simple declaring it in the <meta/> tag does not actually make
it be encoded as UTF-8. For example, in Notepad on WinXP, choose
File>Save As… and select UTF-8 in the Encoding list. Other editors that
support UTF-8 will have the option available somewhere. This W3C I18N
document explains more.

http://www.w3.org/International/ques...lications.html

Currently, your test page is saved as ISO-8859-1, or at least
something (maybe windows-1252) that shares the same character codes for
these special characters: The Swedish characters in question is:
Latin letter a with ring above = &aring; (å)
Latin letter a with diaeresis = &auml; (ä)
Latin letter o with diaeresis = &ouml; (ö)


If you set the character encoding manually with your browser to
iso-8859-1, the characters seem to display correctly. However, there
are other problems with your page that need addressing as well.

This is from the source code of your test page:

<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
....

1. The encoding in the <?xml?> processing instruction does not match
that in the <meta/> element. It's probably better to omit it anyway,
because it puts IE into quirks mode, especially when the document is
being served as text/html.

2. The doctype is XHTML 1.1, but it's being served as text/html. The
spec explicitly says that it *must not*, and should be served as
application/xhtml+xml, text/xml or application/xml. So, unless you have
the ability to set up content negotiation and serve HTML4.01 or XHTML
1.0 Strict to IE (and anything else that only supports text/html) and
XHTML 1.1 to others that do support application/xhtml+xml, then I
recommend you either write HTML 4.01 or XHTML 1.0 strict.

--
Lachlan Hunt
http://www.lachy.id.au/
la**********@lachy.id.au.update.virus.scanners

Remove .update.virus.scanners to email me,
NO SPAM and NO VIRUSES!!!
Jul 20 '05 #2

P: n/a
Arne wrote:
Since I am Swedish, I write website content mostly in Swedish language
and using charset iso-8859-1. I have (just for testing) tried to use
utf-8 on a test page ( http://w1.978.telia.com/~u97802964/test.html )
but the special Swedish characters don't come out right if I dont use
entities for them.

So my question to anybody who can give an answer, is how it comes that
I fail and somebody else can do it? I can't see anything different in
this other page that could make it possible :(


1) Correct <?xml version="1.0" encoding="iso-8859-1"?>
(that's not the problem, but needs changing)

I think it's your text editor, what are you using? Check it is set to
utf-8 mode.

The characters appear chinese in my text editor, and 'missing symbol' in
my browser.

--
Matt
-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----== Over 100,000 Newsgroups - 19 Different Servers! =-----
Jul 20 '05 #3

P: n/a
Lachlan Hunt <la**********@lachy.id.au.update.virus.scanners> wrote:
2. The doctype is XHTML 1.1, but it's being served as text/html. The
spec explicitly says that it *must not*


Incorrect.

--
Spartanicus
Jul 20 '05 #4

P: n/a

Lachlan Hunt wrote:
Arne wrote:

Since I am Swedish, I write website content mostly in Swedish language
and using charset iso-8859-1. I have (just for testing) tried to use
utf-8 on a test page ( http://w1.978.telia.com/~u97802964/test.html )
but the special Swedish characters don't come out right if I dont use
entities for them.

To use UTF-8, you need to tell your editor to actually save the file
as UTF-8. Simple declaring it in the <meta/> tag does not actually make
it be encoded as UTF-8. For example, in Notepad on WinXP, choose
File>Save As… and select UTF-8 in the Encoding list. Other editors that
support UTF-8 will have the option available somewhere. This W3C I18N
document explains more.

http://www.w3.org/International/ques...lications.html


Thank's! I don't have WinXP (still on Win98) so my Notepad don't have
the character set option when saving. But I looked in to Mozilla
Composer and could see it there. I never use Composer for editing
websites, so I have not seen it before :-)

Well, looking furter in my editor after a similar thing as in
Composer, I find it! Not in a obvious place for me, so that is why I
have not seen it before since the editor is new for me. But now I know
how to do it. Thank's for putting me on the track! :-)
Currently, your test page is saved as ISO-8859-1, or at least
something (maybe windows-1252) that shares the same character codes for
these special characters:
Oh, sorry. The whole test page was a mess after my "experiments" with
the meta tags, when I tried different settings. :-)
2. The doctype is XHTML 1.1, but it's being served as text/html. The
spec explicitly says that it *must not*, and should be served as
application/xhtml+xml, text/xml or application/xml. So, unless you have
the ability to set up content negotiation and serve HTML4.01 or XHTML
1.0 Strict to IE (and anything else that only supports text/html) and
XHTML 1.1 to others that do support application/xhtml+xml, then I
recommend you either write HTML 4.01 or XHTML 1.0 strict.


Well, as I learned it says that XHTML 1.1 should (but must not) be
served as application/xhtml+xml, text/xml or application/xml I have
just testing it as text/html. Those test pages still gets valid
always. But "normaly" I use HTML 4.01 :-)

But I have also noticed that I don't realy need the tag with
"Content-Type" and "charset" when I have the <?xml version="1.0"
encoding="utf-8"?> in top of the page to get valid pages, and it works
fine in IE 6 but I don't know about other browsers and versions?

--
/Arne
Jul 20 '05 #5

P: n/a
On Sun, 11 Jul 2004, Spartanicus wrote:
Lachlan Hunt <la**********@lachy.id.au.update.virus.scanners> wrote:
2. The doctype is XHTML 1.1, but it's being served as text/html. The
spec explicitly says that it *must not*


Incorrect.


You'd both of you be doing the group a favour if you cited your
sources. But that's not the only side of this problem.

To the best of my knowledge, the XHTML/1.1 specification doesn't
address what content-type is suitable, but it seems to me that
http://www.w3.org/TR/2002/NOTE-xhtml...ypes-20020801/
while it uses the terms "should" (not) rather than "must" (not), makes
it clear enough that the use of text/html is appropriate only when the
compatibility rules of XHTML/1.0 Appendix C apply.

There is no corresponding compatibility recommendation for XHTML/1.1.
I can't see any point in leaving newcomers to believe that there's any
benefit in writing XHTML/1.1 and serving it as text/html.
Nit-picking over whether a W3C NOTE is authoritative or normative and
whether it uses the precise term MUST is an amusing digression in its
way, but it's hardly a useful piece of practical advice, which I think
would be useful here. (Except that it's all been said before -
and not just once).

Jul 20 '05 #6

P: n/a
Arne wrote:
But I have also noticed that I don't realy need the tag with
"Content-Type" and "charset" when I have the <?xml version="1.0"
encoding="utf-8"?> in top of the page to get valid pages, and it works
fine in IE 6 but I don't know about other browsers and versions?


Not if your serving the document as text/html. In such cases, the
document is parsed as HTML/tag-soup, not XML, so the <?xml?> PI is
ignored, but it also sends IE into quirks mode, which can produce
different results, depending on what your using, and the differences
between IE's quirks and more quirks (oops... I mean “standards
compliant‚ÄĚ) modes :). Also, note that Appendix C [1] of the XHTML 1.0
spec recommends omitting the <?xml?> PI for compatibility with legacy
user agents.

Also, if you have the ability to do so, you should configure your
server to send the charset in the Content-Type HTTP header. At the
moment, it's only sending:

Content-Type: text/html

but it should be sending something like:

Content-Type: text/html; charset=utf-8
(after you correctly save your document as utf-8)

You should be able to configure that through a .htaccess file, if
your host allows. If that is done correctly, then you don't need to
declare it in the <meta/> tag, but note that if your going to send the
document as X(HT)ML, then you shouldn't include the charset in the
Content-Type header [2]. eg.

Content-Type: application/xhtml+xml

and then you should either include the <?xml?> PI to declare the
charset, or you can only use UTF-8 or UTF-16 because they are the defaults.

[1] http://www.w3.org/TR/xhtml1/#C_1

(This is only a draft, but it's still got some good advice in it)
[2] http://www.w3.org/TR/2004/WD-webarch...05/#no-charset

--
Lachlan Hunt
http://www.lachy.id.au/
la**********@lachy.id.au.update.virus.scanners

Remove .update.virus.scanners to email me,
NO SPAM and NO VIRUSES!!!
Jul 20 '05 #7

P: n/a

Lachlan Hunt wrote:
Arne wrote:
But I have also noticed that I don't realy need the tag with
"Content-Type" and "charset" when I have the <?xml version="1.0"
encoding="utf-8"?> in top of the page to get valid pages, and it works
fine in IE 6 but I don't know about other browsers and versions?

Not if your serving the document as text/html. In such cases, the
document is parsed as HTML/tag-soup, not XML, so the <?xml?> PI is
ignored, but it also sends IE into quirks mode, which can produce
different results, depending on what your using, and the differences
between IE's quirks and more quirks (oops... I mean “standards
compliant‚ÄĚ) modes :). Also, note that Appendix C [1] of the XHTML 1.0
spec recommends omitting the <?xml?> PI for compatibility with legacy
user agents.

Also, if you have the ability to do so, you should configure your
server to send the charset in the Content-Type HTTP header. At the
moment, it's only sending:

Content-Type: text/html

but it should be sending something like:

Content-Type: text/html; charset=utf-8
(after you correctly save your document as utf-8)

You should be able to configure that through a .htaccess file, if
your host allows. If that is done correctly, then you don't need to
declare it in the <meta/> tag, but note that if your going to send the
document as X(HT)ML, then you shouldn't include the charset in the
Content-Type header [2]. eg.

Content-Type: application/xhtml+xml

and then you should either include the <?xml?> PI to declare the
charset, or you can only use UTF-8 or UTF-16 because they are the defaults.

[1] http://www.w3.org/TR/xhtml1/#C_1

(This is only a draft, but it's still got some good advice in it)
[2] http://www.w3.org/TR/2004/WD-webarch...05/#no-charset


Thank's a lot for the tip's, I will save them for future reading :-)

I have a new test page (http://w1.978.telia.com/~u97802964/test2.html)
Different content but still for same purpose. I can't do the .htaccess
for the test file, as it is on my ISP's server, but will be useful on
domain hosts.

I noticed after more testing with IE that it's very buggy changing
encoding if I don't have the META with "Content-Type" and "charset" on
the file (was forced to change manualy) so I put it back on the new
test page. I have also tried to omit the the <?xml?> PI and it still
works and validate. But as I am only testing and learning, I leave it
on the test page.

I have done a lot of HTML, and get interested in learning XHTML and
XML as it can be useful in the future. :-)
About the rendering mode, it's easy to see in Mozilla if a page is
rendering as standard or quirks mode just by looking at the "view page
info". But it's not possible to see how IE is rendering the same page?

Thank's again for your input!

--
/Arne
Jul 20 '05 #8

P: n/a
"Arne" <ar********@telia.com> wrote in
comp.infosystems.www.authoring.html:
Thank's a lot for the tip's, I will save them for future reading :-)


You might enjoy the book EATS, SHOOTS & LEAVES by Lynne Truss.

--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://OakRoadSystems.com
"Sticklers unite! You have nothing to lose but your sense of
proportion (and arguably you didn't have a lot of that to
begin with)." -- Lynne Truss, /Eats, Shoots & Leaves/
Jul 20 '05 #9

P: n/a


Arne wrote:

About the rendering mode, it's easy to see in Mozilla if a page is
rendering as standard or quirks mode just by looking at the "view page
info". But it's not possible to see how IE is rendering the same page?


You could use a bookmarklet to show the document.compatMode property:
javascript: alert(document.compatMode); void 0
In IE5/5.5 which doesn't have any strict mode but could be said to be
always in quirks mode with its IE only box model that should alert
undefined
with IE 6 if it is quirks mode that should show
BackCompat
and if it is strict mode it should show
CSS1Compat

--

Martin Honnen
http://JavaScript.FAQTs.com/

Jul 20 '05 #10

P: n/a

Martin Honnen wrote:

Arne wrote:
About the rendering mode, it's easy to see in Mozilla if a page is
rendering as standard or quirks mode just by looking at the "view page
info". But it's not possible to see how IE is rendering the same page?

You could use a bookmarklet to show the document.compatMode property:
javascript: alert(document.compatMode); void 0
In IE5/5.5 which doesn't have any strict mode but could be said to be
always in quirks mode with its IE only box model that should alert
undefined
with IE 6 if it is quirks mode that should show
BackCompat
and if it is strict mode it should show
CSS1Compat


Thank's Martin for the tip, appreciate it!

--
/Arne
Jul 20 '05 #11

This discussion thread is closed

Replies have been disabled for this discussion.