473,790 Members | 2,734 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

UTF-8 and Latin-1 characters

Since I am Swedish, I write website content mostly in Swedish language
and using charset iso-8859-1. I have (just for testing) tried to use
utf-8 on a test page ( http://w1.978.telia.com/~u97802964/test.html )
but the special Swedish characters don't come out right if I dont use
entities for them.

The Swedish characters in question is:
Latin letter a with ring above = å (ĺ)
Latin letter a with diaeresis = ä (ä)
Latin letter o with diaeresis = ö (ö)

I realize I can use the entities, but I found a page with Swedish
content ( http://w1.318.comhem.se/~u31827122/scsiguide.html ) where
utf-8 is used and the characters come out right even without the use
of entities.

So my question to anybody who can give an answer, is how it comes that
I fail and somebody else can do it? I can't see anything different in
this other page that could make it possible :(

--
/Arne
Jul 20 '05 #1
10 9706
Arne wrote:
Since I am Swedish, I write website content mostly in Swedish language
and using charset iso-8859-1. I have (just for testing) tried to use
utf-8 on a test page ( http://w1.978.telia.com/~u97802964/test.html )
but the special Swedish characters don't come out right if I dont use
entities for them.
To use UTF-8, you need to tell your editor to actually save the file
as UTF-8. Simple declaring it in the <meta/> tag does not actually make
it be encoded as UTF-8. For example, in Notepad on WinXP, choose
File>Save As… and select UTF-8 in the Encoding list. Other editors that
support UTF-8 will have the option available somewhere. This W3C I18N
document explains more.

http://www.w3.org/International/ques...lications.html

Currently, your test page is saved as ISO-8859-1, or at least
something (maybe windows-1252) that shares the same character codes for
these special characters: The Swedish characters in question is:
Latin letter a with ring above = &aring; (ĂĄ)
Latin letter a with diaeresis = &auml; (ä)
Latin letter o with diaeresis = &ouml; (ö)


If you set the character encoding manually with your browser to
iso-8859-1, the characters seem to display correctly. However, there
are other problems with your page that need addressing as well.

This is from the source code of your test page:

<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
....

1. The encoding in the <?xml?> processing instruction does not match
that in the <meta/> element. It's probably better to omit it anyway,
because it puts IE into quirks mode, especially when the document is
being served as text/html.

2. The doctype is XHTML 1.1, but it's being served as text/html. The
spec explicitly says that it *must not*, and should be served as
application/xhtml+xml, text/xml or application/xml. So, unless you have
the ability to set up content negotiation and serve HTML4.01 or XHTML
1.0 Strict to IE (and anything else that only supports text/html) and
XHTML 1.1 to others that do support application/xhtml+xml, then I
recommend you either write HTML 4.01 or XHTML 1.0 strict.

--
Lachlan Hunt
http://www.lachy.id.au/
la**********@la chy.id.au.updat e.virus.scanners

Remove .update.virus.s canners to email me,
NO SPAM and NO VIRUSES!!!
Jul 20 '05 #2
Arne wrote:
Since I am Swedish, I write website content mostly in Swedish language
and using charset iso-8859-1. I have (just for testing) tried to use
utf-8 on a test page ( http://w1.978.telia.com/~u97802964/test.html )
but the special Swedish characters don't come out right if I dont use
entities for them.

So my question to anybody who can give an answer, is how it comes that
I fail and somebody else can do it? I can't see anything different in
this other page that could make it possible :(


1) Correct <?xml version="1.0" encoding="iso-8859-1"?>
(that's not the problem, but needs changing)

I think it's your text editor, what are you using? Check it is set to
utf-8 mode.

The characters appear chinese in my text editor, and 'missing symbol' in
my browser.

--
Matt
-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----== Over 100,000 Newsgroups - 19 Different Servers! =-----
Jul 20 '05 #3
Lachlan Hunt <la**********@l achy.id.au.upda te.virus.scanne rs> wrote:
2. The doctype is XHTML 1.1, but it's being served as text/html. The
spec explicitly says that it *must not*


Incorrect.

--
Spartanicus
Jul 20 '05 #4

Lachlan Hunt wrote:
Arne wrote:

Since I am Swedish, I write website content mostly in Swedish language
and using charset iso-8859-1. I have (just for testing) tried to use
utf-8 on a test page ( http://w1.978.telia.com/~u97802964/test.html )
but the special Swedish characters don't come out right if I dont use
entities for them.

To use UTF-8, you need to tell your editor to actually save the file
as UTF-8. Simple declaring it in the <meta/> tag does not actually make
it be encoded as UTF-8. For example, in Notepad on WinXP, choose
File>Save As… and select UTF-8 in the Encoding list. Other editors that
support UTF-8 will have the option available somewhere. This W3C I18N
document explains more.

http://www.w3.org/International/ques...lications.html


Thank's! I don't have WinXP (still on Win98) so my Notepad don't have
the character set option when saving. But I looked in to Mozilla
Composer and could see it there. I never use Composer for editing
websites, so I have not seen it before :-)

Well, looking furter in my editor after a similar thing as in
Composer, I find it! Not in a obvious place for me, so that is why I
have not seen it before since the editor is new for me. But now I know
how to do it. Thank's for putting me on the track! :-)
Currently, your test page is saved as ISO-8859-1, or at least
something (maybe windows-1252) that shares the same character codes for
these special characters:
Oh, sorry. The whole test page was a mess after my "experiment s" with
the meta tags, when I tried different settings. :-)
2. The doctype is XHTML 1.1, but it's being served as text/html. The
spec explicitly says that it *must not*, and should be served as
application/xhtml+xml, text/xml or application/xml. So, unless you have
the ability to set up content negotiation and serve HTML4.01 or XHTML
1.0 Strict to IE (and anything else that only supports text/html) and
XHTML 1.1 to others that do support application/xhtml+xml, then I
recommend you either write HTML 4.01 or XHTML 1.0 strict.


Well, as I learned it says that XHTML 1.1 should (but must not) be
served as application/xhtml+xml, text/xml or application/xml I have
just testing it as text/html. Those test pages still gets valid
always. But "normaly" I use HTML 4.01 :-)

But I have also noticed that I don't realy need the tag with
"Content-Type" and "charset" when I have the <?xml version="1.0"
encoding="utf-8"?> in top of the page to get valid pages, and it works
fine in IE 6 but I don't know about other browsers and versions?

--
/Arne
Jul 20 '05 #5
On Sun, 11 Jul 2004, Spartanicus wrote:
Lachlan Hunt <la**********@l achy.id.au.upda te.virus.scanne rs> wrote:
2. The doctype is XHTML 1.1, but it's being served as text/html. The
spec explicitly says that it *must not*


Incorrect.


You'd both of you be doing the group a favour if you cited your
sources. But that's not the only side of this problem.

To the best of my knowledge, the XHTML/1.1 specification doesn't
address what content-type is suitable, but it seems to me that
http://www.w3.org/TR/2002/NOTE-xhtml...ypes-20020801/
while it uses the terms "should" (not) rather than "must" (not), makes
it clear enough that the use of text/html is appropriate only when the
compatibility rules of XHTML/1.0 Appendix C apply.

There is no corresponding compatibility recommendation for XHTML/1.1.
I can't see any point in leaving newcomers to believe that there's any
benefit in writing XHTML/1.1 and serving it as text/html.
Nit-picking over whether a W3C NOTE is authoritative or normative and
whether it uses the precise term MUST is an amusing digression in its
way, but it's hardly a useful piece of practical advice, which I think
would be useful here. (Except that it's all been said before -
and not just once).

Jul 20 '05 #6
Arne wrote:
But I have also noticed that I don't realy need the tag with
"Content-Type" and "charset" when I have the <?xml version="1.0"
encoding="utf-8"?> in top of the page to get valid pages, and it works
fine in IE 6 but I don't know about other browsers and versions?


Not if your serving the document as text/html. In such cases, the
document is parsed as HTML/tag-soup, not XML, so the <?xml?> PI is
ignored, but it also sends IE into quirks mode, which can produce
different results, depending on what your using, and the differences
between IE's quirks and more quirks (oops... I mean “standards
compliant”) modes :). Also, note that Appendix C [1] of the XHTML 1.0
spec recommends omitting the <?xml?> PI for compatibility with legacy
user agents.

Also, if you have the ability to do so, you should configure your
server to send the charset in the Content-Type HTTP header. At the
moment, it's only sending:

Content-Type: text/html

but it should be sending something like:

Content-Type: text/html; charset=utf-8
(after you correctly save your document as utf-8)

You should be able to configure that through a .htaccess file, if
your host allows. If that is done correctly, then you don't need to
declare it in the <meta/> tag, but note that if your going to send the
document as X(HT)ML, then you shouldn't include the charset in the
Content-Type header [2]. eg.

Content-Type: application/xhtml+xml

and then you should either include the <?xml?> PI to declare the
charset, or you can only use UTF-8 or UTF-16 because they are the defaults.

[1] http://www.w3.org/TR/xhtml1/#C_1

(This is only a draft, but it's still got some good advice in it)
[2] http://www.w3.org/TR/2004/WD-webarch...05/#no-charset

--
Lachlan Hunt
http://www.lachy.id.au/
la**********@la chy.id.au.updat e.virus.scanners

Remove .update.virus.s canners to email me,
NO SPAM and NO VIRUSES!!!
Jul 20 '05 #7

Lachlan Hunt wrote:
Arne wrote:
But I have also noticed that I don't realy need the tag with
"Content-Type" and "charset" when I have the <?xml version="1.0"
encoding="u tf-8"?> in top of the page to get valid pages, and it works
fine in IE 6 but I don't know about other browsers and versions?

Not if your serving the document as text/html. In such cases, the
document is parsed as HTML/tag-soup, not XML, so the <?xml?> PI is
ignored, but it also sends IE into quirks mode, which can produce
different results, depending on what your using, and the differences
between IE's quirks and more quirks (oops... I mean “standards
compliant”) modes :). Also, note that Appendix C [1] of the XHTML 1.0
spec recommends omitting the <?xml?> PI for compatibility with legacy
user agents.

Also, if you have the ability to do so, you should configure your
server to send the charset in the Content-Type HTTP header. At the
moment, it's only sending:

Content-Type: text/html

but it should be sending something like:

Content-Type: text/html; charset=utf-8
(after you correctly save your document as utf-8)

You should be able to configure that through a .htaccess file, if
your host allows. If that is done correctly, then you don't need to
declare it in the <meta/> tag, but note that if your going to send the
document as X(HT)ML, then you shouldn't include the charset in the
Content-Type header [2]. eg.

Content-Type: application/xhtml+xml

and then you should either include the <?xml?> PI to declare the
charset, or you can only use UTF-8 or UTF-16 because they are the defaults.

[1] http://www.w3.org/TR/xhtml1/#C_1

(This is only a draft, but it's still got some good advice in it)
[2] http://www.w3.org/TR/2004/WD-webarch...05/#no-charset


Thank's a lot for the tip's, I will save them for future reading :-)

I have a new test page (http://w1.978.telia.com/~u97802964/test2.html)
Different content but still for same purpose. I can't do the .htaccess
for the test file, as it is on my ISP's server, but will be useful on
domain hosts.

I noticed after more testing with IE that it's very buggy changing
encoding if I don't have the META with "Content-Type" and "charset" on
the file (was forced to change manualy) so I put it back on the new
test page. I have also tried to omit the the <?xml?> PI and it still
works and validate. But as I am only testing and learning, I leave it
on the test page.

I have done a lot of HTML, and get interested in learning XHTML and
XML as it can be useful in the future. :-)
About the rendering mode, it's easy to see in Mozilla if a page is
rendering as standard or quirks mode just by looking at the "view page
info". But it's not possible to see how IE is rendering the same page?

Thank's again for your input!

--
/Arne
Jul 20 '05 #8
"Arne" <ar********@tel ia.com> wrote in
comp.infosystem s.www.authoring.html:
Thank's a lot for the tip's, I will save them for future reading :-)


You might enjoy the book EATS, SHOOTS & LEAVES by Lynne Truss.

--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://OakRoadSystems.com
"Sticklers unite! You have nothing to lose but your sense of
proportion (and arguably you didn't have a lot of that to
begin with)." -- Lynne Truss, /Eats, Shoots & Leaves/
Jul 20 '05 #9


Arne wrote:

About the rendering mode, it's easy to see in Mozilla if a page is
rendering as standard or quirks mode just by looking at the "view page
info". But it's not possible to see how IE is rendering the same page?


You could use a bookmarklet to show the document.compat Mode property:
javascript: alert(document. compatMode); void 0
In IE5/5.5 which doesn't have any strict mode but could be said to be
always in quirks mode with its IE only box model that should alert
undefined
with IE 6 if it is quirks mode that should show
BackCompat
and if it is strict mode it should show
CSS1Compat

--

Martin Honnen
http://JavaScript.FAQTs.com/

Jul 20 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

27
5152
by: EU citizen | last post by:
Do web pages have to be created in unicode in order to use UTF-8 encoding? If so, can anyone name a free application which I can use under Windows 98 to create web pages?
38
5739
by: Haines Brown | last post by:
I'm having trouble finding the character entity for the French abbreviation for "number" (capital N followed by a small supercript o, period). My references are not listing it. Where would I find an answer to this question (don't find it in the W3C_char_entities document). -- Haines Brown brownh@hartford-hwp.com
7
5002
by: Philipp Lenssen | last post by:
How do I load and save a UTF-8 document in XML in ASP/VBS? Well, the loading* is not the problem actually -- the file is in UTF-8, and understood correctly -- but once saved, the UTF-8 is replaced by what seems to be iso-8859-1 (which Flash doesn't understand, but that's another problem). Any help greatly appreciated. * Something like this...
1
15610
by: stevelooking41 | last post by:
Can someone explain why I don't seem unable to use document.write to produce a valid UTF-8 none breaking space sequence (Hex: C2A0) ? I've tried everyway I've been able to find to tell the browser I'm trying to print UTF-8 and still no luck. I'd like the first 2 tries to match the second two tries as far as output. <HTML> <meta http-equiv="Content-Type" content="application/x-script; charset=UTF-8">
6
18766
by: jmgonet | last post by:
Hello everybody, I'm having troubles loading a Xml string encoded in UTF-8. If I try this code: ------------------------------ XmlDocument doc=new XmlDocument(); String s="<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"yes\"?><a>Schönbühl</a>"; doc.LoadXml(s); doc.Save("d:\\temp\\test.xml");
1
2021
by: David Bertoni | last post by:
Hi all, I'm trying to resolve what appears to me an inconsistency in the XML 1.0 recommendation involving entities encoding in UTF-16 and the requirement for a byte order mark. Section 4.3.3 has the following text: http://www.w3.org/TR/REC-xml/#charencoding
7
12154
by: Jimmy Shaw | last post by:
Hi everybody, Is there any SIMPLE way to convert from UTF-16 to UTF-32? I may be mixed up, but is it possible that all UTF-16 "code points" that are 16 bits long appear just the same in UTF-32, but with zero padding and hence no real conversion is necessary? If I am completely wrong and some intricate conversion operation needs to take place, can anyone give me some primer on the subject?
23
5030
by: Allan Ebdrup | last post by:
I hava an ajax web application where i hvae problems with UTF-8 encoding oc chineese chars. My Ajax webapplication runs in a HTML page that is UTF-8 Encoded. I copy and paste some chineese chars from another HTML page viewed in IE7, that is also UTF-8 encoded (search for "china" on google.com). I paste the chineese chars into a content editable div. My Ajax webservice compiles an XML where the data from the content editable div is...
35
4358
by: Bjoern Hoehrmann | last post by:
Hi, For a free software project, I had to write a routine that, given a Unicode scalar value U+0000 - U+10FFFF, returns an integer that holds the UTF-8 encoded form of it, for example, U+00F6 becomes 0x0000C3B6. I came up with the following. I am looking for a more elegant solution, that is, roughly, faster, shorter, more readable, ... while producing the same ouput for the cited range. unsigned int
4
6876
by: =?ISO-8859-2?Q?Boris_Du=B9ek?= | last post by:
Hi, I have an API that returns UTF-8 encoded strings. I have a utf8 codevt facet available to do the conversion from UTF-8 to wchar_t encoding defined by the platform. I have no trouble converting when a UTF-8 encoded string comes from file - I just create a std::wifstream and imbue it with a locale that uses the utf-8 facet for std::locale::ctype. Then I just use operator>to get wstring properly decoded from UTF-8. I thought I could...
0
9666
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9512
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10413
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10200
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9021
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7530
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5422
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
2
3707
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2909
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.