By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
457,997 Members | 1,303 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 457,997 IT Pros & Developers. It's quick & easy.

what's the right encoding?

P: n/a
problem: pasting characters from MSWord into wysiwyg editor (tinyMCE)

When we paste text from Word (i.e. MSWindows) to the browser, and in
the text is any special character, like smart quotes, but also accented
letters, they show up incorrectly.

What we see is e.g.

=============
turtle species are ¬"endangered¬" or ¬"critically endangered¬"
=============

The strangest thing is that on our test server, where we have the same
website running, the problem does *not* occur, but we cannot find any
difference.

The head of our documents is using:

=================
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=iso-8859-1">
=================

In a newer version of our system we use:
=================
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US"
xml:lang="en-US">
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=iso-8859-1" />
=================

But that doesn't change anything. The problem still occurs

Where do these characters come from? How do we prevent this from
happening?

Michiel

Mar 30 '06 #1
Share this Question
Share on Google+
5 Replies


P: n/a
On 30 Mar 2006, Michiel wrote:
problem: pasting characters from MSWord into wysiwyg editor (tinyMCE)
When we paste text from Word (i.e. MSWindows) to the browser, and in
the text is any special character, like smart quotes, but also accented
letters, they show up incorrectly.
I don't know what "tinyMCE" is - but you need a "Unicode-savvy"
editor when pasting characters from MS Word. If I'm not mistaken,
MS Word encodes characters in UTF-8 or UTF-16.
Try Mozilla's Composer, for example.
The head of our documents is using:
<meta http-equiv="Content-Type" content="text/html;
charset=iso-8859-1">
This is pointless. Rather make sure that your *webserver*
sends the correct Content-Type *including* charset parameter.
http://www.w3.org/International/O-HTTP-charset.html
http://ppewww.ph.gla.ac.uk/~flavell/...t/ns-burp.html
In a newer version of our system we use:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US"
xml:lang="en-US">


Why? Because XHTML is "newer"?
XHTML 1.0 *Transitional* is only used by the clueless.
If you use Transitional, you could equally well take HTML 3.2.
Take a look (or two) at
http://ppewww.ph.gla.ac.uk/~flavell/charset/

--
All free men, wherever they may live, are citizens of Denmark.
And therefore, as a free man, I take pride in the words "Jeg er dansker!"

Mar 30 '06 #2

P: n/a
On 30 Mar 2006 07:19:26 -0800, "Michiel" <bl****@gmail.com> wrote:
When we paste text from Word (i.e. MSWindows) to the browser, and in
the text is any special character, like smart quotes, but also accented
letters, they show up incorrectly.


Ask Word to change the encoding of the characters to UTF-8, and then
declare that as your character set.

Ian
--
http://sundry.ws/
Mar 30 '06 #3

P: n/a
Michiel wrote:
problem: pasting characters from MSWord into wysiwyg editor (tinyMCE)

When we paste text from Word (i.e. MSWindows) to the browser, and in
the text is any special character, like smart quotes, but also accented
letters, they show up incorrectly.

What we see is e.g.

=============
turtle species are ¬"endangered¬" or ¬"critically endangered¬"
Looks like mislabelled utf-8. Configure your server to label it
correctly.
The head of our documents is using:

=================
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=iso-8859-1">
Real HTTP headers matter. That crap (normally) doesn't.

In a newer version of our system we use:
=================
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">


Newer? That's for legacy stuff. That is to say, stuff that
was marked legacy in 1997/98.

--
Nick Kew
Mar 30 '06 #4

P: n/a
On Thu, 30 Mar 2006, Ian Rastall wrote:
Ask Word to change the encoding of the characters to UTF-8, and then
declare that as your character set.

^^^^^^^^^^^^^

For "character set", read "encoding" (as the subject line correctly
stated it already), specified in XHTML by "encoding", and in HTTP by
the old (and now somewhat misleading) MIME attribute called "charset".

In HTML/XHTML, the "document character set" is *always*
iso-10646/unicode, irrespective of the external character encoding.
There's an important principle at stake.

cheers
Apr 3 '06 #5

P: n/a
On Mon, 3 Apr 2006 17:27:23 +0100, "Alan J. Flavell"
<fl*****@physics.gla.ac.uk> wrote:
There's an important principle at stake.


I agree. It was just a bit of sloppy writing. I shall go back to
Jukka's page on ... that subject ... and get the terminology correct
yet again.

Ian
--
http://sundry.ws/
Apr 3 '06 #6

This discussion thread is closed

Replies have been disabled for this discussion.