473,326 Members | 2,147 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,326 software developers and data experts.

UTF-8 encoding decoding not working with Danish characters

Hi all,
I am new to XML, but I use it for an RSS feed.

I have one problem, which I have really been struggling with.

My XML document is generated from the contents of a MySQL database. It is
UTF-8 encoded.

However, the Danish special characters appear wrong.

For example the letter å becomes "Ã¥", the letter ø becomes "ø"

See an examle here:
http://netm.dk/blog/rss/index_rss2.xml

I thought that it could be because the encoding was not set in the document,
so I added this:
<?xml version="1.0" encoding="UTF-8" ?>
However, that did not make any difference, as can be seen here:
http://netm.dk/blog/rss/test_rss2.xml

The text decodes correctly on my regular web pages on http://netm.dk/

What am I doing wrong?

Regards,
Lars
www.netm.dk


Jul 20 '05 #1
18 17406
LarsM wrote:
Hi all,
I am new to XML, but I use it for an RSS feed.

I have one problem, which I have really been struggling with.

My XML document is generated from the contents of a MySQL database. It is
UTF-8 encoded.

However, the Danish special characters appear wrong.

For example the letter å becomes "Ã¥", the letter ø becomes "ø"

See an examle here:
http://netm.dk/blog/rss/index_rss2.xml

I thought that it could be because the encoding was not set in the document,
so I added this:
<?xml version="1.0" encoding="UTF-8" ?>
However, that did not make any difference, as can be seen here:
http://netm.dk/blog/rss/test_rss2.xml

The text decodes correctly on my regular web pages on http://netm.dk/

What am I doing wrong?

Regards,
Lars
www.netm.dk

This is not limited to XML. I try to send JavaMail mails. When doing
this from a Windows PC, Danish characters are garbled, when running the
exact same program on Linux, the characters get through fine.

Hope we get rid of thos ¤%@£¥ darned NLS issues sometime in my lifetime,
but I doubt it.
Jul 20 '05 #2
LarsM wrote:
My XML document is generated from the contents of a MySQL database. It is
UTF-8 encoded.
You have to take care that *every* tool in the toolchain
knows how to handle utf-8 correctly. Maybe you give us
a list of tools involved ?
The text decodes correctly on my regular web pages on http://netm.dk/


Your web page looks OK to me.
I bet it is in the database or shortly thereafter.
Jul 20 '05 #3

"Jürgen Kahrs" wrote:

Maybe you give us a list of tools involved ?


Thanks Jürgen,
The RSS feed is being generated by the same Blog application
("Boastmachine"), which I use to generate the Web pages. As far as I know it
accesses the database in the same way as for the "real" pages.
But I will check up on that.
-Lars


Jul 20 '05 #4
LarsM wrote:
The RSS feed is being generated by the same Blog application
("Boastmachine"), which I use to generate the Web pages. As far as I know it
accesses the database in the same way as for the "real" pages.
So the problem should be in the Blog application.
But I will check up on that.


Good idea. Maybe there is simply a bug in the RSS
extraction mechanism.
Jul 20 '05 #5
LarsM wrote:
Hi all,
I am new to XML, but I use it for an RSS feed.

I have one problem, which I have really been struggling with.

My XML document is generated from the contents of a MySQL database. It is
UTF-8 encoded.

However, the Danish special characters appear wrong.

For example the letter å becomes "Ã¥", the letter ø becomes "ø"

See an examle here:
http://netm.dk/blog/rss/index_rss2.xml

I thought that it could be because the encoding was not set in the document,
so I added this:
<?xml version="1.0" encoding="UTF-8" ?>
However, that did not make any difference, as can be seen here:
http://netm.dk/blog/rss/test_rss2.xml

The text decodes correctly on my regular web pages on http://netm.dk/

What am I doing wrong?

Regards,
Lars
www.netm.dk

Pointing my (Linux) Firefox browser at your web site, and having
encoding set to utf-8, I see you page fine. Setting encoding to
ISO-8859-1 generates the å stuff. One never knows how the users'
browsers are setup.

Look at this page: www.vietbao.com

Great looking, authentic, Vietnamese fonts with utf-8. Obviously not
looking good with iso (vn fonts not part of iso..).
Jul 20 '05 #6

"Malte" wrote:
Pointing my (Linux) Firefox browser at your web site, and having encoding
set to utf-8, I see you page fine. Setting encoding to ISO-8859-1
generates the å stuff. One never knows how the users' browsers are setup.


Is that looking at http://netm.dk/blog/rss/index_rss2.xml also?

-Lars
Jul 20 '05 #7
LarsM wrote:
Hi all,
I am new to XML, but I use it for an RSS feed.

I have one problem, which I have really been struggling with.

My XML document is generated from the contents of a MySQL database. It is
UTF-8 encoded.


No. It's ASCII encoded before an agent even looks at the document itself.
See RFC3023 for details.

The good news is that the fix is a single line in httpd.conf.

--
Nick Kew
Jul 20 '05 #8
/LarsM/:
My XML document is generated from the contents of a MySQL database. It is
UTF-8 encoded.

However, the Danish special characters appear wrong.

For example the letter å becomes "Ã¥", the letter ø becomes "ø"

See an examle here:
http://netm.dk/blog/rss/index_rss2.xml


Sound like an MySQL configuration issue, to me.

--
Stanimir
Jul 20 '05 #9

"Nick Kew" wrote:

The good news is that the fix is a single line in httpd.conf.


I don't have my own Apache server, but am using an ISP (Freepaq.dk). Where
can I make the configuration change, then?

-Lars
Jul 20 '05 #10
In article <42***********************@dread15.news.tele.dk> ,
"LarsM" <ma**************@TAKETHISAWAYnetm.dk> wrote:
I don't have my own Apache server, but am using an ISP (Freepaq.dk). Where
can I make the configuration change, then?


In a .htaccess file if your host allows it. Failing that, you could ask
your host to map .xml to application/xml. Failing that, I recommend
switching to another host.

--
Henri Sivonen
hs******@iki.fi
http://iki.fi/hsivonen/
Jul 20 '05 #11
LarsM wrote:
"Malte" wrote:
Pointing my (Linux) Firefox browser at your web site, and having encoding
set to utf-8, I see you page fine. Setting encoding to ISO-8859-1
generates the å stuff. One never knows how the users' browsers are setup.

Is that looking at http://netm.dk/blog/rss/index_rss2.xml also?

-Lars


That gives me the funny looking chars as well, regardless of encoding
settings in the browser.

BTW, solved my JavaMail NLS problem. Had Tomcat start with the
-DEncoding parm set.
Jul 20 '05 #12

"Henri Sivonen" wrote:
In a .htaccess file if your host allows it. Failing that, you could ask
your host to map .xml to application/xml. Failing that, I recommend
switching to another host.


I've been reading through the RFC, but please enlighten me. What would the
syntax be for setting this? Please be as specific as possible.

Regards,
Lars
Jul 20 '05 #13
Hi there
Henri Sivonen wrote:
In a .htaccess file if your host allows it. Failing that, you could ask
your host to map .xml to application/xml. Failing that, I recommend
switching to another host.


lynx -head http://netm.dk/blog/rss/index_rss2.xml
HTTP/1.0 200 OK
Date: Thu, 10 Feb 2005 14:46:31 GMT
Server: Apache/1.3.33 (Unix) mod_perl/1.29 DAV/1.0.3 mod_gzip/1.3.26.1a
PHP/4.3.9
Last-Modified: Tue, 08 Feb 2005 08:03:13 GMT
ETag: "bd67c1-1141-42087241"
Accept-Ranges: bytes
Content-Length: 4417
Content-Type: application/xml
Age: 704
X-Cache: HIT from www.sput.nl
X-Cache-Lookup: HIT from www.sput.nl:8080
Proxy-Connection: close

lynx -head http://netm.dk/blog/rss/test_rss2.xml
HTTP/1.0 200 OK
Date: Thu, 10 Feb 2005 14:48:18 GMT
Server: Apache/1.3.33 (Unix) mod_perl/1.29 DAV/1.0.3 mod_gzip/1.3.26.1a
PHP/4.3.9
Last-Modified: Mon, 07 Feb 2005 18:45:44 GMT
ETag: "11e2dc0-1022-4207b758"
Accept-Ranges: bytes
Content-Length: 4130
Content-Type: application/xml
Age: 624
X-Cache: HIT from www.sput.nl
X-Cache-Lookup: HIT from www.sput.nl:8080
Proxy-Connection: close

This one on my box;
lynx -head http://www.sput.nl/software/leased-line/leased-line.xml
HTTP/1.1 200 OK
Date: Thu, 10 Feb 2005 15:00:02 GMT
Server: Apache/1.3.26 (Unix) Debian GNU/Linux PHP/4.1.2
Last-Modified: Sun, 30 Jan 2005 07:44:42 GMT
ETag: "2787c-4840-41fc906a"
Accept-Ranges: bytes
Content-Length: 18496
Connection: close
Content-Type: text/xml; charset=UTF-8

However, my browser does consider all these files to be UTF-8 XML.
Regards,
Rob
--
+----------------------------------------------------------------------+
| The EU constitution will turn the EU into a SU |
| Vote against the EU constitution in the referendum |
+----------------------------------------------------------------------+
Jul 20 '05 #14
Sorry, but excactly how do I set that setting, which Nick Kew and Henry
Sivonen suggested?

I have been reading through the RFC, but it is not completely clear to me...

Cheers,
Lars
www.netm.dk
Jul 20 '05 #15
/LarsM/:
My XML document is generated from the contents of a MySQL database. It is UTF-8 encoded.

However, the Danish special characters appear wrong.

For example the letter å becomes "Ã¥", the letter ø becomes "ø"

See an examle here:
http://netm.dk/blog/rss/index_rss2.xml


Sound like an MySQL configuration issue, to me.


Sorry, but excactly how do I set that setting, which Nick Kew and Henry
Sivonen suggested?

I have been reading through the RFC, but it is not completely clear to me...


Please, quote at least some relevant text from the post you're
replying to.

What I've meant is, AFAIK MySQL versions prior 4.1 doesn't handle
Unicode characters. I have no experience with the 4.1 version but
seems the encoding configuration could be tricky with it, too.

It could happen that a text is inserted into the DB using some
encoding and read using another (depending on the connection driver
configuration) producing different results. So, I guess, somehow the
info is inserted UTF-8 encoded but then read using ISO-8859-1, for
example. Generally it has nothing to do with RFCs but MySQL specific
configuration.

I've worked on an application which used MySQL 4.0 as data store and
because it was targeted for the Japanese market we had to configure
the connection driver specifically to encode/decode using a
Shift_JIS encoding.

--
Stanimir
Jul 20 '05 #16

"Stanimir Stamenkov" wrote :


What I've meant is, AFAIK MySQL versions prior 4.1 doesn't handle Unicode
characters. I have no experience with the 4.1 version but seems the
encoding configuration could be tricky with it, too.

Thank you Stanimir. I think my Web host is on 4.0 only. I will look into
that and maybe go for another encoding all the way through...
Sorry about not quoting correctly...

Regards,
Lars
www.netm.dk
Jul 20 '05 #17
Hi there
LarsM wrote:
I am new to XML, but I use it for an RSS feed.

I have one problem, which I have really been struggling with.

My XML document is generated from the contents of a MySQL database. It is
UTF-8 encoded.

However, the Danish special characters appear wrong.

For example the letter å becomes "Ã¥", the letter ø becomes "ø"


In ISO-8859-1 a-ring is 0xE5, in UTF-8 0xC3 0xA5
0xC3 0xA5 in ISO-8859-1 is A-tilde Yen.
The same applies to the other example.

So maybe the data gets stored as UTF-8 but retreived as ISO-8859-1 and
then converted to UTF-8.
Vr.Gr,
Rob
--
+----------------------------------------------------------------------+
| The EU constitution will turn the EU into a SU |
| Vote against the EU constitution in the referendum |
+----------------------------------------------------------------------+
Jul 20 '05 #18
On Thu, 10 Feb 2005, LarsM wrote:
X-Newsreader: Microsoft Outlook Express 6.00.2900.2180

However, the Danish special characters appear wrong.
For example the letter ? becomes "??", the letter ? becomes "??"


As long as you are unable to post special, non-ASCII characters
with appropriate MIME header in your newsreader^W Outlook Express,
don't expect anything.

You need to make these settings:

Tools > Options > Send
Mail Sending Format > Plain Text Settings > Message format MIME
News Sending Format > Plain Text Settings > Message format MIME
Encode text using: None

Better yet, get a newsreader instead of OE.

--
Top-posting.
What's the most irritating thing on Usenet?

Jul 20 '05 #19

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: JTMar | last post by:
I need to save a file in UTF-8 format from an ASP file. Using FileSystemObject I can only save them in Unicode (UTF-16, I think) but not in UTF-8: Set zzzz = fso.OpenTextFile(filepath, 2, true,...
38
by: Haines Brown | last post by:
I'm having trouble finding the character entity for the French abbreviation for "number" (capital N followed by a small supercript o, period). My references are not listing it. Where would I...
16
by: lawrence | last post by:
I was told in another newsgroup (about XML, I was wondering how to control user input) that most modern browsers empower the designer to cast the user created input to a particular character...
22
by: Martin Trautmann | last post by:
Hi all, is there any kind of 'hiconv' or other (unix-like) conversion tool that would convert UTF-8 to HTML (ISO-Latin-1 and Unicode)? The database output is UTF-8 or UTF-16 only - Thus almost...
1
by: stevelooking41 | last post by:
Can someone explain why I don't seem unable to use document.write to produce a valid UTF-8 none breaking space sequence (Hex: C2A0) ? I've tried everyway I've been able to find to tell the...
4
by: Cott Lang | last post by:
ERROR: could not convert UTF-8 character 0x00ef to ISO8859-1 Running 7.4.5, I frequently get this error, and ONLY on this particular character despite seeing quite a bit of 8 bit. I don't really...
5
by: davihigh | last post by:
Hi Friends: fileObj = codecs.open( filename, "r", "utf-8" ) u = fileObj.read() # Returns a Unicode string from the UTF-8 bytes in the file print u It says error: UnicodeEncodeError: 'gbk'...
12
by: Rafał Maj Raf256 | last post by:
Hi, I have an UNICODE text file endcoded in UTF-8. I should store the UNICODE strings in my program for example in std::wstring right? To be able to work on them normally, so that std::wstring...
7
by: Jimmy Shaw | last post by:
Hi everybody, Is there any SIMPLE way to convert from UTF-16 to UTF-32? I may be mixed up, but is it possible that all UTF-16 "code points" that are 16 bits long appear just the same in UTF-32,...
8
by: Siegfried Heintze | last post by:
The following perl program works when I run it from urxvt-X console on cygwin-x windows LC_CTYPE=en_US.UTF-8 urxvt-X.exe& perl -wle "binmode STDOUT, q; print chr() for 0x410 .. 0x430;" This...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.