469,330 Members | 1,166 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,330 developers. It's quick & easy.

UTF-8 encoding decoding not working with Danish characters

Hi all,
I am new to XML, but I use it for an RSS feed.

I have one problem, which I have really been struggling with.

My XML document is generated from the contents of a MySQL database. It is
UTF-8 encoded.

However, the Danish special characters appear wrong.

For example the letter becomes "å", the letter becomes "ø"

See an examle here:
http://netm.dk/blog/rss/index_rss2.xml

I thought that it could be because the encoding was not set in the document,
so I added this:
<?xml version="1.0" encoding="UTF-8" ?>
However, that did not make any difference, as can be seen here:
http://netm.dk/blog/rss/test_rss2.xml

The text decodes correctly on my regular web pages on http://netm.dk/

What am I doing wrong?

Regards,
Lars
www.netm.dk


Jul 20 '05 #1
18 15751
LarsM wrote:
Hi all,
I am new to XML, but I use it for an RSS feed.

I have one problem, which I have really been struggling with.

My XML document is generated from the contents of a MySQL database. It is
UTF-8 encoded.

However, the Danish special characters appear wrong.

For example the letter becomes "å", the letter becomes "ø"

See an examle here:
http://netm.dk/blog/rss/index_rss2.xml

I thought that it could be because the encoding was not set in the document,
so I added this:
<?xml version="1.0" encoding="UTF-8" ?>
However, that did not make any difference, as can be seen here:
http://netm.dk/blog/rss/test_rss2.xml

The text decodes correctly on my regular web pages on http://netm.dk/

What am I doing wrong?

Regards,
Lars
www.netm.dk

This is not limited to XML. I try to send JavaMail mails. When doing
this from a Windows PC, Danish characters are garbled, when running the
exact same program on Linux, the characters get through fine.

Hope we get rid of thos %@ darned NLS issues sometime in my lifetime,
but I doubt it.
Jul 20 '05 #2
LarsM wrote:
My XML document is generated from the contents of a MySQL database. It is
UTF-8 encoded.
You have to take care that *every* tool in the toolchain
knows how to handle utf-8 correctly. Maybe you give us
a list of tools involved ?
The text decodes correctly on my regular web pages on http://netm.dk/


Your web page looks OK to me.
I bet it is in the database or shortly thereafter.
Jul 20 '05 #3

"Jrgen Kahrs" wrote:

Maybe you give us a list of tools involved ?


Thanks Jrgen,
The RSS feed is being generated by the same Blog application
("Boastmachine"), which I use to generate the Web pages. As far as I know it
accesses the database in the same way as for the "real" pages.
But I will check up on that.
-Lars


Jul 20 '05 #4
LarsM wrote:
The RSS feed is being generated by the same Blog application
("Boastmachine"), which I use to generate the Web pages. As far as I know it
accesses the database in the same way as for the "real" pages.
So the problem should be in the Blog application.
But I will check up on that.


Good idea. Maybe there is simply a bug in the RSS
extraction mechanism.
Jul 20 '05 #5
LarsM wrote:
Hi all,
I am new to XML, but I use it for an RSS feed.

I have one problem, which I have really been struggling with.

My XML document is generated from the contents of a MySQL database. It is
UTF-8 encoded.

However, the Danish special characters appear wrong.

For example the letter becomes "å", the letter becomes "ø"

See an examle here:
http://netm.dk/blog/rss/index_rss2.xml

I thought that it could be because the encoding was not set in the document,
so I added this:
<?xml version="1.0" encoding="UTF-8" ?>
However, that did not make any difference, as can be seen here:
http://netm.dk/blog/rss/test_rss2.xml

The text decodes correctly on my regular web pages on http://netm.dk/

What am I doing wrong?

Regards,
Lars
www.netm.dk

Pointing my (Linux) Firefox browser at your web site, and having
encoding set to utf-8, I see you page fine. Setting encoding to
ISO-8859-1 generates the å stuff. One never knows how the users'
browsers are setup.

Look at this page: www.vietbao.com

Great looking, authentic, Vietnamese fonts with utf-8. Obviously not
looking good with iso (vn fonts not part of iso..).
Jul 20 '05 #6

"Malte" wrote:
Pointing my (Linux) Firefox browser at your web site, and having encoding
set to utf-8, I see you page fine. Setting encoding to ISO-8859-1
generates the å stuff. One never knows how the users' browsers are setup.


Is that looking at http://netm.dk/blog/rss/index_rss2.xml also?

-Lars
Jul 20 '05 #7
LarsM wrote:
Hi all,
I am new to XML, but I use it for an RSS feed.

I have one problem, which I have really been struggling with.

My XML document is generated from the contents of a MySQL database. It is
UTF-8 encoded.


No. It's ASCII encoded before an agent even looks at the document itself.
See RFC3023 for details.

The good news is that the fix is a single line in httpd.conf.

--
Nick Kew
Jul 20 '05 #8
/LarsM/:
My XML document is generated from the contents of a MySQL database. It is
UTF-8 encoded.

However, the Danish special characters appear wrong.

For example the letter becomes "å", the letter becomes "ø"

See an examle here:
http://netm.dk/blog/rss/index_rss2.xml


Sound like an MySQL configuration issue, to me.

--
Stanimir
Jul 20 '05 #9

"Nick Kew" wrote:

The good news is that the fix is a single line in httpd.conf.


I don't have my own Apache server, but am using an ISP (Freepaq.dk). Where
can I make the configuration change, then?

-Lars
Jul 20 '05 #10
In article <42***********************@dread15.news.tele.dk> ,
"LarsM" <ma**************@TAKETHISAWAYnetm.dk> wrote:
I don't have my own Apache server, but am using an ISP (Freepaq.dk). Where
can I make the configuration change, then?


In a .htaccess file if your host allows it. Failing that, you could ask
your host to map .xml to application/xml. Failing that, I recommend
switching to another host.

--
Henri Sivonen
hs******@iki.fi
http://iki.fi/hsivonen/
Jul 20 '05 #11
LarsM wrote:
"Malte" wrote:
Pointing my (Linux) Firefox browser at your web site, and having encoding
set to utf-8, I see you page fine. Setting encoding to ISO-8859-1
generates the å stuff. One never knows how the users' browsers are setup.

Is that looking at http://netm.dk/blog/rss/index_rss2.xml also?

-Lars


That gives me the funny looking chars as well, regardless of encoding
settings in the browser.

BTW, solved my JavaMail NLS problem. Had Tomcat start with the
-DEncoding parm set.
Jul 20 '05 #12

"Henri Sivonen" wrote:
In a .htaccess file if your host allows it. Failing that, you could ask
your host to map .xml to application/xml. Failing that, I recommend
switching to another host.


I've been reading through the RFC, but please enlighten me. What would the
syntax be for setting this? Please be as specific as possible.

Regards,
Lars
Jul 20 '05 #13
Hi there
Henri Sivonen wrote:
In a .htaccess file if your host allows it. Failing that, you could ask
your host to map .xml to application/xml. Failing that, I recommend
switching to another host.


lynx -head http://netm.dk/blog/rss/index_rss2.xml
HTTP/1.0 200 OK
Date: Thu, 10 Feb 2005 14:46:31 GMT
Server: Apache/1.3.33 (Unix) mod_perl/1.29 DAV/1.0.3 mod_gzip/1.3.26.1a
PHP/4.3.9
Last-Modified: Tue, 08 Feb 2005 08:03:13 GMT
ETag: "bd67c1-1141-42087241"
Accept-Ranges: bytes
Content-Length: 4417
Content-Type: application/xml
Age: 704
X-Cache: HIT from www.sput.nl
X-Cache-Lookup: HIT from www.sput.nl:8080
Proxy-Connection: close

lynx -head http://netm.dk/blog/rss/test_rss2.xml
HTTP/1.0 200 OK
Date: Thu, 10 Feb 2005 14:48:18 GMT
Server: Apache/1.3.33 (Unix) mod_perl/1.29 DAV/1.0.3 mod_gzip/1.3.26.1a
PHP/4.3.9
Last-Modified: Mon, 07 Feb 2005 18:45:44 GMT
ETag: "11e2dc0-1022-4207b758"
Accept-Ranges: bytes
Content-Length: 4130
Content-Type: application/xml
Age: 624
X-Cache: HIT from www.sput.nl
X-Cache-Lookup: HIT from www.sput.nl:8080
Proxy-Connection: close

This one on my box;
lynx -head http://www.sput.nl/software/leased-line/leased-line.xml
HTTP/1.1 200 OK
Date: Thu, 10 Feb 2005 15:00:02 GMT
Server: Apache/1.3.26 (Unix) Debian GNU/Linux PHP/4.1.2
Last-Modified: Sun, 30 Jan 2005 07:44:42 GMT
ETag: "2787c-4840-41fc906a"
Accept-Ranges: bytes
Content-Length: 18496
Connection: close
Content-Type: text/xml; charset=UTF-8

However, my browser does consider all these files to be UTF-8 XML.
Regards,
Rob
--
+----------------------------------------------------------------------+
| The EU constitution will turn the EU into a SU |
| Vote against the EU constitution in the referendum |
+----------------------------------------------------------------------+
Jul 20 '05 #14
Sorry, but excactly how do I set that setting, which Nick Kew and Henry
Sivonen suggested?

I have been reading through the RFC, but it is not completely clear to me...

Cheers,
Lars
www.netm.dk
Jul 20 '05 #15
/LarsM/:
My XML document is generated from the contents of a MySQL database. It is UTF-8 encoded.

However, the Danish special characters appear wrong.

For example the letter becomes "å", the letter becomes "ø"

See an examle here:
http://netm.dk/blog/rss/index_rss2.xml


Sound like an MySQL configuration issue, to me.


Sorry, but excactly how do I set that setting, which Nick Kew and Henry
Sivonen suggested?

I have been reading through the RFC, but it is not completely clear to me...


Please, quote at least some relevant text from the post you're
replying to.

What I've meant is, AFAIK MySQL versions prior 4.1 doesn't handle
Unicode characters. I have no experience with the 4.1 version but
seems the encoding configuration could be tricky with it, too.

It could happen that a text is inserted into the DB using some
encoding and read using another (depending on the connection driver
configuration) producing different results. So, I guess, somehow the
info is inserted UTF-8 encoded but then read using ISO-8859-1, for
example. Generally it has nothing to do with RFCs but MySQL specific
configuration.

I've worked on an application which used MySQL 4.0 as data store and
because it was targeted for the Japanese market we had to configure
the connection driver specifically to encode/decode using a
Shift_JIS encoding.

--
Stanimir
Jul 20 '05 #16

"Stanimir Stamenkov" wrote :


What I've meant is, AFAIK MySQL versions prior 4.1 doesn't handle Unicode
characters. I have no experience with the 4.1 version but seems the
encoding configuration could be tricky with it, too.

Thank you Stanimir. I think my Web host is on 4.0 only. I will look into
that and maybe go for another encoding all the way through...
Sorry about not quoting correctly...

Regards,
Lars
www.netm.dk
Jul 20 '05 #17
Hi there
LarsM wrote:
I am new to XML, but I use it for an RSS feed.

I have one problem, which I have really been struggling with.

My XML document is generated from the contents of a MySQL database. It is
UTF-8 encoded.

However, the Danish special characters appear wrong.

For example the letter becomes "å", the letter becomes "ø"


In ISO-8859-1 a-ring is 0xE5, in UTF-8 0xC3 0xA5
0xC3 0xA5 in ISO-8859-1 is A-tilde Yen.
The same applies to the other example.

So maybe the data gets stored as UTF-8 but retreived as ISO-8859-1 and
then converted to UTF-8.
Vr.Gr,
Rob
--
+----------------------------------------------------------------------+
| The EU constitution will turn the EU into a SU |
| Vote against the EU constitution in the referendum |
+----------------------------------------------------------------------+
Jul 20 '05 #18
On Thu, 10 Feb 2005, LarsM wrote:
X-Newsreader: Microsoft Outlook Express 6.00.2900.2180

However, the Danish special characters appear wrong.
For example the letter ? becomes "??", the letter ? becomes "??"


As long as you are unable to post special, non-ASCII characters
with appropriate MIME header in your newsreader^W Outlook Express,
don't expect anything.

You need to make these settings:

Tools > Options > Send
Mail Sending Format > Plain Text Settings > Message format MIME
News Sending Format > Plain Text Settings > Message format MIME
Encode text using: None

Better yet, get a newsreader instead of OE.

--
Top-posting.
What's the most irritating thing on Usenet?

Jul 20 '05 #19

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

38 posts views Thread by Haines Brown | last post: by
22 posts views Thread by Martin Trautmann | last post: by
1 post views Thread by stevelooking41 | last post: by
4 posts views Thread by Cott Lang | last post: by
5 posts views Thread by davihigh | last post: by
7 posts views Thread by Jimmy Shaw | last post: by
8 posts views Thread by Siegfried Heintze | last post: by
reply views Thread by zhoujie | last post: by
reply views Thread by suresh191 | last post: by
reply views Thread by Purva khokhar | last post: by
1 post views Thread by haryvincent176 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.