Help | Site Map
Connecting Tech Pros Worldwide
 
 
LinkBack Thread Tools
  #1  
Old July 20th, 2005, 08:19 PM
Arne
Guest
 
Posts: n/a
Default UTF-8 and Latin-1 characters

Since I am Swedish, I write website content mostly in Swedish language
and using charset iso-8859-1. I have (just for testing) tried to use
utf-8 on a test page ( http://w1.978.telia.com/~u97802964/test.html )
but the special Swedish characters don't come out right if I dont use
entities for them.

The Swedish characters in question is:
Latin letter a with ring above = å ()
Latin letter a with diaeresis = ä ()
Latin letter o with diaeresis = ö ()

I realize I can use the entities, but I found a page with Swedish
content ( http://w1.318.comhem.se/~u31827122/scsiguide.html ) where
utf-8 is used and the characters come out right even without the use
of entities.

So my question to anybody who can give an answer, is how it comes that
I fail and somebody else can do it? I can't see anything different in
this other page that could make it possible :(

--
/Arne
  #2  
Old July 20th, 2005, 08:19 PM
Lachlan Hunt
Guest
 
Posts: n/a
Default Re: UTF-8 and Latin-1 characters

Arne wrote:
[color=blue]
> Since I am Swedish, I write website content mostly in Swedish language
> and using charset iso-8859-1. I have (just for testing) tried to use
> utf-8 on a test page ( http://w1.978.telia.com/~u97802964/test.html )
> but the special Swedish characters don't come out right if I dont use
> entities for them.[/color]

To use UTF-8, you need to tell your editor to actually save the file
as UTF-8. Simple declaring it in the <meta/> tag does not actually make
it be encoded as UTF-8. For example, in Notepad on WinXP, choose
File>Save As… and select UTF-8 in the Encoding list. Other editors that
support UTF-8 will have the option available somewhere. This W3C I18N
document explains more.

http://www.w3.org/International/ques...lications.html

Currently, your test page is saved as ISO-8859-1, or at least
something (maybe windows-1252) that shares the same character codes for
these special characters:[color=blue]
> The Swedish characters in question is:
> Latin letter a with ring above = &aring; (å)
> Latin letter a with diaeresis = &auml; (ä)
> Latin letter o with diaeresis = &ouml; (ö)[/color]

If you set the character encoding manually with your browser to
iso-8859-1, the characters seem to display correctly. However, there
are other problems with your page that need addressing as well.

This is from the source code of your test page:

<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
....

1. The encoding in the <?xml?> processing instruction does not match
that in the <meta/> element. It's probably better to omit it anyway,
because it puts IE into quirks mode, especially when the document is
being served as text/html.

2. The doctype is XHTML 1.1, but it's being served as text/html. The
spec explicitly says that it *must not*, and should be served as
application/xhtml+xml, text/xml or application/xml. So, unless you have
the ability to set up content negotiation and serve HTML4.01 or XHTML
1.0 Strict to IE (and anything else that only supports text/html) and
XHTML 1.1 to others that do support application/xhtml+xml, then I
recommend you either write HTML 4.01 or XHTML 1.0 strict.

--
Lachlan Hunt
http://www.lachy.id.au/
lachlan.hunt@lachy.id.au.update.virus.scanners

Remove .update.virus.scanners to email me,
NO SPAM and NO VIRUSES!!!
  #3  
Old July 20th, 2005, 08:19 PM
Matt
Guest
 
Posts: n/a
Default Re: UTF-8 and Latin-1 characters

Arne wrote:
[color=blue]
> Since I am Swedish, I write website content mostly in Swedish language
> and using charset iso-8859-1. I have (just for testing) tried to use
> utf-8 on a test page ( http://w1.978.telia.com/~u97802964/test.html )
> but the special Swedish characters don't come out right if I dont use
> entities for them.
>
> So my question to anybody who can give an answer, is how it comes that
> I fail and somebody else can do it? I can't see anything different in
> this other page that could make it possible :([/color]

1) Correct <?xml version="1.0" encoding="iso-8859-1"?>
(that's not the problem, but needs changing)

I think it's your text editor, what are you using? Check it is set to
utf-8 mode.

The characters appear chinese in my text editor, and 'missing symbol' in
my browser.

--
Matt


-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----== Over 100,000 Newsgroups - 19 Different Servers! =-----
  #4  
Old July 20th, 2005, 08:19 PM
Spartanicus
Guest
 
Posts: n/a
Default Re: UTF-8 and Latin-1 characters

Lachlan Hunt <lachlan.hunt@lachy.id.au.update.virus.scanners> wrote:
[color=blue]
>2. The doctype is XHTML 1.1, but it's being served as text/html. The
>spec explicitly says that it *must not*[/color]

Incorrect.

--
Spartanicus
  #5  
Old July 20th, 2005, 08:19 PM
Arne
Guest
 
Posts: n/a
Default Re: UTF-8 and Latin-1 characters


Lachlan Hunt wrote:
[color=blue]
> Arne wrote:
>
>[color=green]
>>Since I am Swedish, I write website content mostly in Swedish language
>>and using charset iso-8859-1. I have (just for testing) tried to use
>>utf-8 on a test page ( http://w1.978.telia.com/~u97802964/test.html )
>>but the special Swedish characters don't come out right if I dont use
>>entities for them.[/color]
>
>
> To use UTF-8, you need to tell your editor to actually save the file
> as UTF-8. Simple declaring it in the <meta/> tag does not actually make
> it be encoded as UTF-8. For example, in Notepad on WinXP, choose
> File>Save As… and select UTF-8 in the Encoding list. Other editors that
> support UTF-8 will have the option available somewhere. This W3C I18N
> document explains more.
>
> http://www.w3.org/International/ques...lications.html[/color]

Thank's! I don't have WinXP (still on Win98) so my Notepad don't have
the character set option when saving. But I looked in to Mozilla
Composer and could see it there. I never use Composer for editing
websites, so I have not seen it before :-)

Well, looking furter in my editor after a similar thing as in
Composer, I find it! Not in a obvious place for me, so that is why I
have not seen it before since the editor is new for me. But now I know
how to do it. Thank's for putting me on the track! :-)
[color=blue]
> Currently, your test page is saved as ISO-8859-1, or at least
> something (maybe windows-1252) that shares the same character codes for
> these special characters:[/color]

Oh, sorry. The whole test page was a mess after my "experiments" with
the meta tags, when I tried different settings. :-)
[color=blue]
> 2. The doctype is XHTML 1.1, but it's being served as text/html. The
> spec explicitly says that it *must not*, and should be served as
> application/xhtml+xml, text/xml or application/xml. So, unless you have
> the ability to set up content negotiation and serve HTML4.01 or XHTML
> 1.0 Strict to IE (and anything else that only supports text/html) and
> XHTML 1.1 to others that do support application/xhtml+xml, then I
> recommend you either write HTML 4.01 or XHTML 1.0 strict.[/color]

Well, as I learned it says that XHTML 1.1 should (but must not) be
served as application/xhtml+xml, text/xml or application/xml I have
just testing it as text/html. Those test pages still gets valid
always. But "normaly" I use HTML 4.01 :-)

But I have also noticed that I don't realy need the tag with
"Content-Type" and "charset" when I have the <?xml version="1.0"
encoding="utf-8"?> in top of the page to get valid pages, and it works
fine in IE 6 but I don't know about other browsers and versions?

--
/Arne
  #6  
Old July 20th, 2005, 08:19 PM
Alan J. Flavell
Guest
 
Posts: n/a
Default Re: UTF-8 and Latin-1 characters

On Sun, 11 Jul 2004, Spartanicus wrote:
[color=blue]
> Lachlan Hunt <lachlan.hunt@lachy.id.au.update.virus.scanners> wrote:
>[color=green]
> >2. The doctype is XHTML 1.1, but it's being served as text/html. The
> >spec explicitly says that it *must not*[/color]
>
> Incorrect.[/color]

You'd both of you be doing the group a favour if you cited your
sources. But that's not the only side of this problem.

To the best of my knowledge, the XHTML/1.1 specification doesn't
address what content-type is suitable, but it seems to me that
http://www.w3.org/TR/2002/NOTE-xhtml...ypes-20020801/
while it uses the terms "should" (not) rather than "must" (not), makes
it clear enough that the use of text/html is appropriate only when the
compatibility rules of XHTML/1.0 Appendix C apply.

There is no corresponding compatibility recommendation for XHTML/1.1.
I can't see any point in leaving newcomers to believe that there's any
benefit in writing XHTML/1.1 and serving it as text/html.
Nit-picking over whether a W3C NOTE is authoritative or normative and
whether it uses the precise term MUST is an amusing digression in its
way, but it's hardly a useful piece of practical advice, which I think
would be useful here. (Except that it's all been said before -
and not just once).

  #7  
Old July 20th, 2005, 08:19 PM
Lachlan Hunt
Guest
 
Posts: n/a
Default Re: UTF-8 and Latin-1 characters

Arne wrote:[color=blue]
> But I have also noticed that I don't realy need the tag with
> "Content-Type" and "charset" when I have the <?xml version="1.0"
> encoding="utf-8"?> in top of the page to get valid pages, and it works
> fine in IE 6 but I don't know about other browsers and versions?[/color]

Not if your serving the document as text/html. In such cases, the
document is parsed as HTML/tag-soup, not XML, so the <?xml?> PI is
ignored, but it also sends IE into quirks mode, which can produce
different results, depending on what your using, and the differences
between IE's quirks and more quirks (oops... I mean “standards
compliant”) modes :). Also, note that Appendix C [1] of the XHTML 1.0
spec recommends omitting the <?xml?> PI for compatibility with legacy
user agents.

Also, if you have the ability to do so, you should configure your
server to send the charset in the Content-Type HTTP header. At the
moment, it's only sending:

Content-Type: text/html

but it should be sending something like:

Content-Type: text/html; charset=utf-8
(after you correctly save your document as utf-8)

You should be able to configure that through a .htaccess file, if
your host allows. If that is done correctly, then you don't need to
declare it in the <meta/> tag, but note that if your going to send the
document as X(HT)ML, then you shouldn't include the charset in the
Content-Type header [2]. eg.

Content-Type: application/xhtml+xml

and then you should either include the <?xml?> PI to declare the
charset, or you can only use UTF-8 or UTF-16 because they are the defaults.

[1] http://www.w3.org/TR/xhtml1/#C_1

(This is only a draft, but it's still got some good advice in it)
[2] http://www.w3.org/TR/2004/WD-webarch...05/#no-charset

--
Lachlan Hunt
http://www.lachy.id.au/
lachlan.hunt@lachy.id.au.update.virus.scanners

Remove .update.virus.scanners to email me,
NO SPAM and NO VIRUSES!!!
  #8  
Old July 20th, 2005, 08:19 PM
Arne
Guest
 
Posts: n/a
Default Re: UTF-8 and Latin-1 characters


Lachlan Hunt wrote:
[color=blue]
> Arne wrote:
>[color=green]
>>But I have also noticed that I don't realy need the tag with
>>"Content-Type" and "charset" when I have the <?xml version="1.0"
>>encoding="utf-8"?> in top of the page to get valid pages, and it works
>>fine in IE 6 but I don't know about other browsers and versions?[/color]
>
>
> Not if your serving the document as text/html. In such cases, the
> document is parsed as HTML/tag-soup, not XML, so the <?xml?> PI is
> ignored, but it also sends IE into quirks mode, which can produce
> different results, depending on what your using, and the differences
> between IE's quirks and more quirks (oops... I mean “standards
> compliant”) modes :). Also, note that Appendix C [1] of the XHTML 1.0
> spec recommends omitting the <?xml?> PI for compatibility with legacy
> user agents.
>
> Also, if you have the ability to do so, you should configure your
> server to send the charset in the Content-Type HTTP header. At the
> moment, it's only sending:
>
> Content-Type: text/html
>
> but it should be sending something like:
>
> Content-Type: text/html; charset=utf-8
> (after you correctly save your document as utf-8)
>
> You should be able to configure that through a .htaccess file, if
> your host allows. If that is done correctly, then you don't need to
> declare it in the <meta/> tag, but note that if your going to send the
> document as X(HT)ML, then you shouldn't include the charset in the
> Content-Type header [2]. eg.
>
> Content-Type: application/xhtml+xml
>
> and then you should either include the <?xml?> PI to declare the
> charset, or you can only use UTF-8 or UTF-16 because they are the defaults.
>
> [1] http://www.w3.org/TR/xhtml1/#C_1
>
> (This is only a draft, but it's still got some good advice in it)
> [2] http://www.w3.org/TR/2004/WD-webarch...05/#no-charset
>[/color]

Thank's a lot for the tip's, I will save them for future reading :-)

I have a new test page (http://w1.978.telia.com/~u97802964/test2.html)
Different content but still for same purpose. I can't do the .htaccess
for the test file, as it is on my ISP's server, but will be useful on
domain hosts.

I noticed after more testing with IE that it's very buggy changing
encoding if I don't have the META with "Content-Type" and "charset" on
the file (was forced to change manualy) so I put it back on the new
test page. I have also tried to omit the the <?xml?> PI and it still
works and validate. But as I am only testing and learning, I leave it
on the test page.

I have done a lot of HTML, and get interested in learning XHTML and
XML as it can be useful in the future. :-)
About the rendering mode, it's easy to see in Mozilla if a page is
rendering as standard or quirks mode just by looking at the "view page
info". But it's not possible to see how IE is rendering the same page?

Thank's again for your input!

--
/Arne
  #9  
Old July 20th, 2005, 08:19 PM
Stan Brown
Guest
 
Posts: n/a
Default Re: UTF-8 and Latin-1 characters

"Arne" <arne.luras@telia.com> wrote in
comp.infosystems.www.authoring.html:[color=blue]
>Thank's a lot for the tip's, I will save them for future reading :-)[/color]

You might enjoy the book EATS, SHOOTS & LEAVES by Lynne Truss.

--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://OakRoadSystems.com
"Sticklers unite! You have nothing to lose but your sense of
proportion (and arguably you didn't have a lot of that to
begin with)." -- Lynne Truss, /Eats, Shoots & Leaves/
  #10  
Old July 20th, 2005, 08:19 PM
Martin Honnen
Guest
 
Posts: n/a
Default Re: UTF-8 and Latin-1 characters



Arne wrote:

[color=blue]
> About the rendering mode, it's easy to see in Mozilla if a page is
> rendering as standard or quirks mode just by looking at the "view page
> info". But it's not possible to see how IE is rendering the same page?[/color]

You could use a bookmarklet to show the document.compatMode property:
javascript: alert(document.compatMode); void 0
In IE5/5.5 which doesn't have any strict mode but could be said to be
always in quirks mode with its IE only box model that should alert
undefined
with IE 6 if it is quirks mode that should show
BackCompat
and if it is strict mode it should show
CSS1Compat

--

Martin Honnen
http://JavaScript.FAQTs.com/

  #11  
Old July 20th, 2005, 08:19 PM
Arne
Guest
 
Posts: n/a
Default Re: UTF-8 and Latin-1 characters


Martin Honnen wrote:[color=blue]
>
> Arne wrote:
>
>
>[color=green]
>>About the rendering mode, it's easy to see in Mozilla if a page is
>>rendering as standard or quirks mode just by looking at the "view page
>>info". But it's not possible to see how IE is rendering the same page?[/color]
>
>
> You could use a bookmarklet to show the document.compatMode property:
> javascript: alert(document.compatMode); void 0
> In IE5/5.5 which doesn't have any strict mode but could be said to be
> always in quirks mode with its IE only box model that should alert
> undefined
> with IE 6 if it is quirks mode that should show
> BackCompat
> and if it is strict mode it should show
> CSS1Compat[/color]

Thank's Martin for the tip, appreciate it!

--
/Arne
 

Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over network members.
Post your question now . . .
It's fast and it's free

Popular Articles