473,399 Members | 4,254 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,399 software developers and data experts.

UTF-8 and Latin-1 characters

Since I am Swedish, I write website content mostly in Swedish language
and using charset iso-8859-1. I have (just for testing) tried to use
utf-8 on a test page ( http://w1.978.telia.com/~u97802964/test.html )
but the special Swedish characters don't come out right if I dont use
entities for them.

The Swedish characters in question is:
Latin letter a with ring above = å ()
Latin letter a with diaeresis = ä ()
Latin letter o with diaeresis = ö ()

I realize I can use the entities, but I found a page with Swedish
content ( http://w1.318.comhem.se/~u31827122/scsiguide.html ) where
utf-8 is used and the characters come out right even without the use
of entities.

So my question to anybody who can give an answer, is how it comes that
I fail and somebody else can do it? I can't see anything different in
this other page that could make it possible :(

--
/Arne
Jul 20 '05 #1
10 9667
Arne wrote:
Since I am Swedish, I write website content mostly in Swedish language
and using charset iso-8859-1. I have (just for testing) tried to use
utf-8 on a test page ( http://w1.978.telia.com/~u97802964/test.html )
but the special Swedish characters don't come out right if I dont use
entities for them.
To use UTF-8, you need to tell your editor to actually save the file
as UTF-8. Simple declaring it in the <meta/> tag does not actually make
it be encoded as UTF-8. For example, in Notepad on WinXP, choose
File>Save As… and select UTF-8 in the Encoding list. Other editors that
support UTF-8 will have the option available somewhere. This W3C I18N
document explains more.

http://www.w3.org/International/ques...lications.html

Currently, your test page is saved as ISO-8859-1, or at least
something (maybe windows-1252) that shares the same character codes for
these special characters: The Swedish characters in question is:
Latin letter a with ring above = &aring; (å)
Latin letter a with diaeresis = &auml; (ä)
Latin letter o with diaeresis = &ouml; (ö)


If you set the character encoding manually with your browser to
iso-8859-1, the characters seem to display correctly. However, there
are other problems with your page that need addressing as well.

This is from the source code of your test page:

<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
....

1. The encoding in the <?xml?> processing instruction does not match
that in the <meta/> element. It's probably better to omit it anyway,
because it puts IE into quirks mode, especially when the document is
being served as text/html.

2. The doctype is XHTML 1.1, but it's being served as text/html. The
spec explicitly says that it *must not*, and should be served as
application/xhtml+xml, text/xml or application/xml. So, unless you have
the ability to set up content negotiation and serve HTML4.01 or XHTML
1.0 Strict to IE (and anything else that only supports text/html) and
XHTML 1.1 to others that do support application/xhtml+xml, then I
recommend you either write HTML 4.01 or XHTML 1.0 strict.

--
Lachlan Hunt
http://www.lachy.id.au/
la**********@lachy.id.au.update.virus.scanners

Remove .update.virus.scanners to email me,
NO SPAM and NO VIRUSES!!!
Jul 20 '05 #2
Arne wrote:
Since I am Swedish, I write website content mostly in Swedish language
and using charset iso-8859-1. I have (just for testing) tried to use
utf-8 on a test page ( http://w1.978.telia.com/~u97802964/test.html )
but the special Swedish characters don't come out right if I dont use
entities for them.

So my question to anybody who can give an answer, is how it comes that
I fail and somebody else can do it? I can't see anything different in
this other page that could make it possible :(


1) Correct <?xml version="1.0" encoding="iso-8859-1"?>
(that's not the problem, but needs changing)

I think it's your text editor, what are you using? Check it is set to
utf-8 mode.

The characters appear chinese in my text editor, and 'missing symbol' in
my browser.

--
Matt
-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----== Over 100,000 Newsgroups - 19 Different Servers! =-----
Jul 20 '05 #3
Lachlan Hunt <la**********@lachy.id.au.update.virus.scanners> wrote:
2. The doctype is XHTML 1.1, but it's being served as text/html. The
spec explicitly says that it *must not*


Incorrect.

--
Spartanicus
Jul 20 '05 #4

Lachlan Hunt wrote:
Arne wrote:

Since I am Swedish, I write website content mostly in Swedish language
and using charset iso-8859-1. I have (just for testing) tried to use
utf-8 on a test page ( http://w1.978.telia.com/~u97802964/test.html )
but the special Swedish characters don't come out right if I dont use
entities for them.

To use UTF-8, you need to tell your editor to actually save the file
as UTF-8. Simple declaring it in the <meta/> tag does not actually make
it be encoded as UTF-8. For example, in Notepad on WinXP, choose
File>Save As… and select UTF-8 in the Encoding list. Other editors that
support UTF-8 will have the option available somewhere. This W3C I18N
document explains more.

http://www.w3.org/International/ques...lications.html


Thank's! I don't have WinXP (still on Win98) so my Notepad don't have
the character set option when saving. But I looked in to Mozilla
Composer and could see it there. I never use Composer for editing
websites, so I have not seen it before :-)

Well, looking furter in my editor after a similar thing as in
Composer, I find it! Not in a obvious place for me, so that is why I
have not seen it before since the editor is new for me. But now I know
how to do it. Thank's for putting me on the track! :-)
Currently, your test page is saved as ISO-8859-1, or at least
something (maybe windows-1252) that shares the same character codes for
these special characters:
Oh, sorry. The whole test page was a mess after my "experiments" with
the meta tags, when I tried different settings. :-)
2. The doctype is XHTML 1.1, but it's being served as text/html. The
spec explicitly says that it *must not*, and should be served as
application/xhtml+xml, text/xml or application/xml. So, unless you have
the ability to set up content negotiation and serve HTML4.01 or XHTML
1.0 Strict to IE (and anything else that only supports text/html) and
XHTML 1.1 to others that do support application/xhtml+xml, then I
recommend you either write HTML 4.01 or XHTML 1.0 strict.


Well, as I learned it says that XHTML 1.1 should (but must not) be
served as application/xhtml+xml, text/xml or application/xml I have
just testing it as text/html. Those test pages still gets valid
always. But "normaly" I use HTML 4.01 :-)

But I have also noticed that I don't realy need the tag with
"Content-Type" and "charset" when I have the <?xml version="1.0"
encoding="utf-8"?> in top of the page to get valid pages, and it works
fine in IE 6 but I don't know about other browsers and versions?

--
/Arne
Jul 20 '05 #5
On Sun, 11 Jul 2004, Spartanicus wrote:
Lachlan Hunt <la**********@lachy.id.au.update.virus.scanners> wrote:
2. The doctype is XHTML 1.1, but it's being served as text/html. The
spec explicitly says that it *must not*


Incorrect.


You'd both of you be doing the group a favour if you cited your
sources. But that's not the only side of this problem.

To the best of my knowledge, the XHTML/1.1 specification doesn't
address what content-type is suitable, but it seems to me that
http://www.w3.org/TR/2002/NOTE-xhtml...ypes-20020801/
while it uses the terms "should" (not) rather than "must" (not), makes
it clear enough that the use of text/html is appropriate only when the
compatibility rules of XHTML/1.0 Appendix C apply.

There is no corresponding compatibility recommendation for XHTML/1.1.
I can't see any point in leaving newcomers to believe that there's any
benefit in writing XHTML/1.1 and serving it as text/html.
Nit-picking over whether a W3C NOTE is authoritative or normative and
whether it uses the precise term MUST is an amusing digression in its
way, but it's hardly a useful piece of practical advice, which I think
would be useful here. (Except that it's all been said before -
and not just once).

Jul 20 '05 #6
Arne wrote:
But I have also noticed that I don't realy need the tag with
"Content-Type" and "charset" when I have the <?xml version="1.0"
encoding="utf-8"?> in top of the page to get valid pages, and it works
fine in IE 6 but I don't know about other browsers and versions?


Not if your serving the document as text/html. In such cases, the
document is parsed as HTML/tag-soup, not XML, so the <?xml?> PI is
ignored, but it also sends IE into quirks mode, which can produce
different results, depending on what your using, and the differences
between IE's quirks and more quirks (oops... I mean “standards
compliant”) modes :). Also, note that Appendix C [1] of the XHTML 1.0
spec recommends omitting the <?xml?> PI for compatibility with legacy
user agents.

Also, if you have the ability to do so, you should configure your
server to send the charset in the Content-Type HTTP header. At the
moment, it's only sending:

Content-Type: text/html

but it should be sending something like:

Content-Type: text/html; charset=utf-8
(after you correctly save your document as utf-8)

You should be able to configure that through a .htaccess file, if
your host allows. If that is done correctly, then you don't need to
declare it in the <meta/> tag, but note that if your going to send the
document as X(HT)ML, then you shouldn't include the charset in the
Content-Type header [2]. eg.

Content-Type: application/xhtml+xml

and then you should either include the <?xml?> PI to declare the
charset, or you can only use UTF-8 or UTF-16 because they are the defaults.

[1] http://www.w3.org/TR/xhtml1/#C_1

(This is only a draft, but it's still got some good advice in it)
[2] http://www.w3.org/TR/2004/WD-webarch...05/#no-charset

--
Lachlan Hunt
http://www.lachy.id.au/
la**********@lachy.id.au.update.virus.scanners

Remove .update.virus.scanners to email me,
NO SPAM and NO VIRUSES!!!
Jul 20 '05 #7

Lachlan Hunt wrote:
Arne wrote:
But I have also noticed that I don't realy need the tag with
"Content-Type" and "charset" when I have the <?xml version="1.0"
encoding="utf-8"?> in top of the page to get valid pages, and it works
fine in IE 6 but I don't know about other browsers and versions?

Not if your serving the document as text/html. In such cases, the
document is parsed as HTML/tag-soup, not XML, so the <?xml?> PI is
ignored, but it also sends IE into quirks mode, which can produce
different results, depending on what your using, and the differences
between IE's quirks and more quirks (oops... I mean “standards
compliant”) modes :). Also, note that Appendix C [1] of the XHTML 1.0
spec recommends omitting the <?xml?> PI for compatibility with legacy
user agents.

Also, if you have the ability to do so, you should configure your
server to send the charset in the Content-Type HTTP header. At the
moment, it's only sending:

Content-Type: text/html

but it should be sending something like:

Content-Type: text/html; charset=utf-8
(after you correctly save your document as utf-8)

You should be able to configure that through a .htaccess file, if
your host allows. If that is done correctly, then you don't need to
declare it in the <meta/> tag, but note that if your going to send the
document as X(HT)ML, then you shouldn't include the charset in the
Content-Type header [2]. eg.

Content-Type: application/xhtml+xml

and then you should either include the <?xml?> PI to declare the
charset, or you can only use UTF-8 or UTF-16 because they are the defaults.

[1] http://www.w3.org/TR/xhtml1/#C_1

(This is only a draft, but it's still got some good advice in it)
[2] http://www.w3.org/TR/2004/WD-webarch...05/#no-charset


Thank's a lot for the tip's, I will save them for future reading :-)

I have a new test page (http://w1.978.telia.com/~u97802964/test2.html)
Different content but still for same purpose. I can't do the .htaccess
for the test file, as it is on my ISP's server, but will be useful on
domain hosts.

I noticed after more testing with IE that it's very buggy changing
encoding if I don't have the META with "Content-Type" and "charset" on
the file (was forced to change manualy) so I put it back on the new
test page. I have also tried to omit the the <?xml?> PI and it still
works and validate. But as I am only testing and learning, I leave it
on the test page.

I have done a lot of HTML, and get interested in learning XHTML and
XML as it can be useful in the future. :-)
About the rendering mode, it's easy to see in Mozilla if a page is
rendering as standard or quirks mode just by looking at the "view page
info". But it's not possible to see how IE is rendering the same page?

Thank's again for your input!

--
/Arne
Jul 20 '05 #8
"Arne" <ar********@telia.com> wrote in
comp.infosystems.www.authoring.html:
Thank's a lot for the tip's, I will save them for future reading :-)


You might enjoy the book EATS, SHOOTS & LEAVES by Lynne Truss.

--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://OakRoadSystems.com
"Sticklers unite! You have nothing to lose but your sense of
proportion (and arguably you didn't have a lot of that to
begin with)." -- Lynne Truss, /Eats, Shoots & Leaves/
Jul 20 '05 #9


Arne wrote:

About the rendering mode, it's easy to see in Mozilla if a page is
rendering as standard or quirks mode just by looking at the "view page
info". But it's not possible to see how IE is rendering the same page?


You could use a bookmarklet to show the document.compatMode property:
javascript: alert(document.compatMode); void 0
In IE5/5.5 which doesn't have any strict mode but could be said to be
always in quirks mode with its IE only box model that should alert
undefined
with IE 6 if it is quirks mode that should show
BackCompat
and if it is strict mode it should show
CSS1Compat

--

Martin Honnen
http://JavaScript.FAQTs.com/

Jul 20 '05 #10

Martin Honnen wrote:

Arne wrote:
About the rendering mode, it's easy to see in Mozilla if a page is
rendering as standard or quirks mode just by looking at the "view page
info". But it's not possible to see how IE is rendering the same page?

You could use a bookmarklet to show the document.compatMode property:
javascript: alert(document.compatMode); void 0
In IE5/5.5 which doesn't have any strict mode but could be said to be
always in quirks mode with its IE only box model that should alert
undefined
with IE 6 if it is quirks mode that should show
BackCompat
and if it is strict mode it should show
CSS1Compat


Thank's Martin for the tip, appreciate it!

--
/Arne
Jul 20 '05 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

27
by: EU citizen | last post by:
Do web pages have to be created in unicode in order to use UTF-8 encoding? If so, can anyone name a free application which I can use under Windows 98 to create web pages?
38
by: Haines Brown | last post by:
I'm having trouble finding the character entity for the French abbreviation for "number" (capital N followed by a small supercript o, period). My references are not listing it. Where would I...
7
by: Philipp Lenssen | last post by:
How do I load and save a UTF-8 document in XML in ASP/VBS? Well, the loading* is not the problem actually -- the file is in UTF-8, and understood correctly -- but once saved, the UTF-8 is...
1
by: stevelooking41 | last post by:
Can someone explain why I don't seem unable to use document.write to produce a valid UTF-8 none breaking space sequence (Hex: C2A0) ? I've tried everyway I've been able to find to tell the...
6
by: jmgonet | last post by:
Hello everybody, I'm having troubles loading a Xml string encoded in UTF-8. If I try this code: ------------------------------ XmlDocument doc=new XmlDocument(); String s="<?xml...
1
by: David Bertoni | last post by:
Hi all, I'm trying to resolve what appears to me an inconsistency in the XML 1.0 recommendation involving entities encoding in UTF-16 and the requirement for a byte order mark. Section 4.3.3...
7
by: Jimmy Shaw | last post by:
Hi everybody, Is there any SIMPLE way to convert from UTF-16 to UTF-32? I may be mixed up, but is it possible that all UTF-16 "code points" that are 16 bits long appear just the same in UTF-32,...
23
by: Allan Ebdrup | last post by:
I hava an ajax web application where i hvae problems with UTF-8 encoding oc chineese chars. My Ajax webapplication runs in a HTML page that is UTF-8 Encoded. I copy and paste some chineese chars...
35
by: Bjoern Hoehrmann | last post by:
Hi, For a free software project, I had to write a routine that, given a Unicode scalar value U+0000 - U+10FFFF, returns an integer that holds the UTF-8 encoded form of it, for example, U+00F6...
4
by: =?ISO-8859-2?Q?Boris_Du=B9ek?= | last post by:
Hi, I have an API that returns UTF-8 encoded strings. I have a utf8 codevt facet available to do the conversion from UTF-8 to wchar_t encoding defined by the platform. I have no trouble...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development projectplanning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.