472,353 Members | 2,034 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,353 software developers and data experts.

UTF-8 encoding decoding not working with Danish characters

Hi all,
I am new to XML, but I use it for an RSS feed.

I have one problem, which I have really been struggling with.

My XML document is generated from the contents of a MySQL database. It is
UTF-8 encoded.

However, the Danish special characters appear wrong.

For example the letter becomes "å", the letter becomes "ø"

See an examle here:
http://netm.dk/blog/rss/index_rss2.xml

I thought that it could be because the encoding was not set in the document,
so I added this:
<?xml version="1.0" encoding="UTF-8" ?>
However, that did not make any difference, as can be seen here:
http://netm.dk/blog/rss/test_rss2.xml

The text decodes correctly on my regular web pages on http://netm.dk/

What am I doing wrong?

Regards,
Lars
www.netm.dk


Jul 20 '05 #1
18 17053
LarsM wrote:
Hi all,
I am new to XML, but I use it for an RSS feed.

I have one problem, which I have really been struggling with.

My XML document is generated from the contents of a MySQL database. It is
UTF-8 encoded.

However, the Danish special characters appear wrong.

For example the letter becomes "å", the letter becomes "ø"

See an examle here:
http://netm.dk/blog/rss/index_rss2.xml

I thought that it could be because the encoding was not set in the document,
so I added this:
<?xml version="1.0" encoding="UTF-8" ?>
However, that did not make any difference, as can be seen here:
http://netm.dk/blog/rss/test_rss2.xml

The text decodes correctly on my regular web pages on http://netm.dk/

What am I doing wrong?

Regards,
Lars
www.netm.dk

This is not limited to XML. I try to send JavaMail mails. When doing
this from a Windows PC, Danish characters are garbled, when running the
exact same program on Linux, the characters get through fine.

Hope we get rid of thos %@ darned NLS issues sometime in my lifetime,
but I doubt it.
Jul 20 '05 #2
LarsM wrote:
My XML document is generated from the contents of a MySQL database. It is
UTF-8 encoded.
You have to take care that *every* tool in the toolchain
knows how to handle utf-8 correctly. Maybe you give us
a list of tools involved ?
The text decodes correctly on my regular web pages on http://netm.dk/


Your web page looks OK to me.
I bet it is in the database or shortly thereafter.
Jul 20 '05 #3

"Jrgen Kahrs" wrote:

Maybe you give us a list of tools involved ?


Thanks Jrgen,
The RSS feed is being generated by the same Blog application
("Boastmachine"), which I use to generate the Web pages. As far as I know it
accesses the database in the same way as for the "real" pages.
But I will check up on that.
-Lars


Jul 20 '05 #4
LarsM wrote:
The RSS feed is being generated by the same Blog application
("Boastmachine"), which I use to generate the Web pages. As far as I know it
accesses the database in the same way as for the "real" pages.
So the problem should be in the Blog application.
But I will check up on that.


Good idea. Maybe there is simply a bug in the RSS
extraction mechanism.
Jul 20 '05 #5
LarsM wrote:
Hi all,
I am new to XML, but I use it for an RSS feed.

I have one problem, which I have really been struggling with.

My XML document is generated from the contents of a MySQL database. It is
UTF-8 encoded.

However, the Danish special characters appear wrong.

For example the letter becomes "å", the letter becomes "ø"

See an examle here:
http://netm.dk/blog/rss/index_rss2.xml

I thought that it could be because the encoding was not set in the document,
so I added this:
<?xml version="1.0" encoding="UTF-8" ?>
However, that did not make any difference, as can be seen here:
http://netm.dk/blog/rss/test_rss2.xml

The text decodes correctly on my regular web pages on http://netm.dk/

What am I doing wrong?

Regards,
Lars
www.netm.dk

Pointing my (Linux) Firefox browser at your web site, and having
encoding set to utf-8, I see you page fine. Setting encoding to
ISO-8859-1 generates the å stuff. One never knows how the users'
browsers are setup.

Look at this page: www.vietbao.com

Great looking, authentic, Vietnamese fonts with utf-8. Obviously not
looking good with iso (vn fonts not part of iso..).
Jul 20 '05 #6

"Malte" wrote:
Pointing my (Linux) Firefox browser at your web site, and having encoding
set to utf-8, I see you page fine. Setting encoding to ISO-8859-1
generates the å stuff. One never knows how the users' browsers are setup.


Is that looking at http://netm.dk/blog/rss/index_rss2.xml also?

-Lars
Jul 20 '05 #7
LarsM wrote:
Hi all,
I am new to XML, but I use it for an RSS feed.

I have one problem, which I have really been struggling with.

My XML document is generated from the contents of a MySQL database. It is
UTF-8 encoded.


No. It's ASCII encoded before an agent even looks at the document itself.
See RFC3023 for details.

The good news is that the fix is a single line in httpd.conf.

--
Nick Kew
Jul 20 '05 #8
/LarsM/:
My XML document is generated from the contents of a MySQL database. It is
UTF-8 encoded.

However, the Danish special characters appear wrong.

For example the letter becomes "å", the letter becomes "ø"

See an examle here:
http://netm.dk/blog/rss/index_rss2.xml


Sound like an MySQL configuration issue, to me.

--
Stanimir
Jul 20 '05 #9

"Nick Kew" wrote:

The good news is that the fix is a single line in httpd.conf.


I don't have my own Apache server, but am using an ISP (Freepaq.dk). Where
can I make the configuration change, then?

-Lars
Jul 20 '05 #10
In article <42***********************@dread15.news.tele.dk> ,
"LarsM" <ma**************@TAKETHISAWAYnetm.dk> wrote:
I don't have my own Apache server, but am using an ISP (Freepaq.dk). Where
can I make the configuration change, then?


In a .htaccess file if your host allows it. Failing that, you could ask
your host to map .xml to application/xml. Failing that, I recommend
switching to another host.

--
Henri Sivonen
hs******@iki.fi
http://iki.fi/hsivonen/
Jul 20 '05 #11
LarsM wrote:
"Malte" wrote:
Pointing my (Linux) Firefox browser at your web site, and having encoding
set to utf-8, I see you page fine. Setting encoding to ISO-8859-1
generates the å stuff. One never knows how the users' browsers are setup.

Is that looking at http://netm.dk/blog/rss/index_rss2.xml also?

-Lars


That gives me the funny looking chars as well, regardless of encoding
settings in the browser.

BTW, solved my JavaMail NLS problem. Had Tomcat start with the
-DEncoding parm set.
Jul 20 '05 #12

"Henri Sivonen" wrote:
In a .htaccess file if your host allows it. Failing that, you could ask
your host to map .xml to application/xml. Failing that, I recommend
switching to another host.


I've been reading through the RFC, but please enlighten me. What would the
syntax be for setting this? Please be as specific as possible.

Regards,
Lars
Jul 20 '05 #13
Hi there
Henri Sivonen wrote:
In a .htaccess file if your host allows it. Failing that, you could ask
your host to map .xml to application/xml. Failing that, I recommend
switching to another host.


lynx -head http://netm.dk/blog/rss/index_rss2.xml
HTTP/1.0 200 OK
Date: Thu, 10 Feb 2005 14:46:31 GMT
Server: Apache/1.3.33 (Unix) mod_perl/1.29 DAV/1.0.3 mod_gzip/1.3.26.1a
PHP/4.3.9
Last-Modified: Tue, 08 Feb 2005 08:03:13 GMT
ETag: "bd67c1-1141-42087241"
Accept-Ranges: bytes
Content-Length: 4417
Content-Type: application/xml
Age: 704
X-Cache: HIT from www.sput.nl
X-Cache-Lookup: HIT from www.sput.nl:8080
Proxy-Connection: close

lynx -head http://netm.dk/blog/rss/test_rss2.xml
HTTP/1.0 200 OK
Date: Thu, 10 Feb 2005 14:48:18 GMT
Server: Apache/1.3.33 (Unix) mod_perl/1.29 DAV/1.0.3 mod_gzip/1.3.26.1a
PHP/4.3.9
Last-Modified: Mon, 07 Feb 2005 18:45:44 GMT
ETag: "11e2dc0-1022-4207b758"
Accept-Ranges: bytes
Content-Length: 4130
Content-Type: application/xml
Age: 624
X-Cache: HIT from www.sput.nl
X-Cache-Lookup: HIT from www.sput.nl:8080
Proxy-Connection: close

This one on my box;
lynx -head http://www.sput.nl/software/leased-line/leased-line.xml
HTTP/1.1 200 OK
Date: Thu, 10 Feb 2005 15:00:02 GMT
Server: Apache/1.3.26 (Unix) Debian GNU/Linux PHP/4.1.2
Last-Modified: Sun, 30 Jan 2005 07:44:42 GMT
ETag: "2787c-4840-41fc906a"
Accept-Ranges: bytes
Content-Length: 18496
Connection: close
Content-Type: text/xml; charset=UTF-8

However, my browser does consider all these files to be UTF-8 XML.
Regards,
Rob
--
+----------------------------------------------------------------------+
| The EU constitution will turn the EU into a SU |
| Vote against the EU constitution in the referendum |
+----------------------------------------------------------------------+
Jul 20 '05 #14
Sorry, but excactly how do I set that setting, which Nick Kew and Henry
Sivonen suggested?

I have been reading through the RFC, but it is not completely clear to me...

Cheers,
Lars
www.netm.dk
Jul 20 '05 #15
/LarsM/:
My XML document is generated from the contents of a MySQL database. It is UTF-8 encoded.

However, the Danish special characters appear wrong.

For example the letter becomes "å", the letter becomes "ø"

See an examle here:
http://netm.dk/blog/rss/index_rss2.xml


Sound like an MySQL configuration issue, to me.


Sorry, but excactly how do I set that setting, which Nick Kew and Henry
Sivonen suggested?

I have been reading through the RFC, but it is not completely clear to me...


Please, quote at least some relevant text from the post you're
replying to.

What I've meant is, AFAIK MySQL versions prior 4.1 doesn't handle
Unicode characters. I have no experience with the 4.1 version but
seems the encoding configuration could be tricky with it, too.

It could happen that a text is inserted into the DB using some
encoding and read using another (depending on the connection driver
configuration) producing different results. So, I guess, somehow the
info is inserted UTF-8 encoded but then read using ISO-8859-1, for
example. Generally it has nothing to do with RFCs but MySQL specific
configuration.

I've worked on an application which used MySQL 4.0 as data store and
because it was targeted for the Japanese market we had to configure
the connection driver specifically to encode/decode using a
Shift_JIS encoding.

--
Stanimir
Jul 20 '05 #16

"Stanimir Stamenkov" wrote :


What I've meant is, AFAIK MySQL versions prior 4.1 doesn't handle Unicode
characters. I have no experience with the 4.1 version but seems the
encoding configuration could be tricky with it, too.

Thank you Stanimir. I think my Web host is on 4.0 only. I will look into
that and maybe go for another encoding all the way through...
Sorry about not quoting correctly...

Regards,
Lars
www.netm.dk
Jul 20 '05 #17
Hi there
LarsM wrote:
I am new to XML, but I use it for an RSS feed.

I have one problem, which I have really been struggling with.

My XML document is generated from the contents of a MySQL database. It is
UTF-8 encoded.

However, the Danish special characters appear wrong.

For example the letter becomes "å", the letter becomes "ø"


In ISO-8859-1 a-ring is 0xE5, in UTF-8 0xC3 0xA5
0xC3 0xA5 in ISO-8859-1 is A-tilde Yen.
The same applies to the other example.

So maybe the data gets stored as UTF-8 but retreived as ISO-8859-1 and
then converted to UTF-8.
Vr.Gr,
Rob
--
+----------------------------------------------------------------------+
| The EU constitution will turn the EU into a SU |
| Vote against the EU constitution in the referendum |
+----------------------------------------------------------------------+
Jul 20 '05 #18
On Thu, 10 Feb 2005, LarsM wrote:
X-Newsreader: Microsoft Outlook Express 6.00.2900.2180

However, the Danish special characters appear wrong.
For example the letter ? becomes "??", the letter ? becomes "??"


As long as you are unable to post special, non-ASCII characters
with appropriate MIME header in your newsreader^W Outlook Express,
don't expect anything.

You need to make these settings:

Tools > Options > Send
Mail Sending Format > Plain Text Settings > Message format MIME
News Sending Format > Plain Text Settings > Message format MIME
Encode text using: None

Better yet, get a newsreader instead of OE.

--
Top-posting.
What's the most irritating thing on Usenet?

Jul 20 '05 #19

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: JTMar | last post by:
I need to save a file in UTF-8 format from an ASP file. Using FileSystemObject I can only save them in Unicode (UTF-16, I think) but not in UTF-8:...
38
by: Haines Brown | last post by:
I'm having trouble finding the character entity for the French abbreviation for "number" (capital N followed by a small supercript o, period). ...
16
by: lawrence | last post by:
I was told in another newsgroup (about XML, I was wondering how to control user input) that most modern browsers empower the designer to cast the...
22
by: Martin Trautmann | last post by:
Hi all, is there any kind of 'hiconv' or other (unix-like) conversion tool that would convert UTF-8 to HTML (ISO-Latin-1 and Unicode)? The...
1
by: stevelooking41 | last post by:
Can someone explain why I don't seem unable to use document.write to produce a valid UTF-8 none breaking space sequence (Hex: C2A0) ? I've tried...
4
by: Cott Lang | last post by:
ERROR: could not convert UTF-8 character 0x00ef to ISO8859-1 Running 7.4.5, I frequently get this error, and ONLY on this particular character...
5
by: davihigh | last post by:
Hi Friends: fileObj = codecs.open( filename, "r", "utf-8" ) u = fileObj.read() # Returns a Unicode string from the UTF-8 bytes in the file...
12
by: Rafał Maj Raf256 | last post by:
Hi, I have an UNICODE text file endcoded in UTF-8. I should store the UNICODE strings in my program for example in std::wstring right? To be...
7
by: Jimmy Shaw | last post by:
Hi everybody, Is there any SIMPLE way to convert from UTF-16 to UTF-32? I may be mixed up, but is it possible that all UTF-16 "code points" that...
8
by: Siegfried Heintze | last post by:
The following perl program works when I run it from urxvt-X console on cygwin-x windows LC_CTYPE=en_US.UTF-8 urxvt-X.exe& perl -wle "binmode...
0
by: Naresh1 | last post by:
What is WebLogic Admin Training? WebLogic Admin Training is a specialized program designed to equip individuals with the skills and knowledge...
0
jalbright99669
by: jalbright99669 | last post by:
Am having a bit of a time with URL Rewrite. I need to incorporate http to https redirect with a reverse proxy. I have the URL Rewrite rules made...
0
by: antdb | last post by:
Ⅰ. Advantage of AntDB: hyper-convergence + streaming processing engine In the overall architecture, a new "hyper-convergence" concept was...
0
by: Matthew3360 | last post by:
Hi there. I have been struggling to find out how to use a variable as my location in my header redirect function. Here is my code. ...
0
hi
by: WisdomUfot | last post by:
It's an interesting question you've got about how Gmail hides the HTTP referrer when a link in an email is clicked. While I don't have the specific...
0
by: Matthew3360 | last post by:
Hi, I have been trying to connect to a local host using php curl. But I am finding it hard to do this. I am doing the curl get request from my web...
0
Oralloy
by: Oralloy | last post by:
Hello Folks, I am trying to hook up a CPU which I designed using SystemC to I/O pins on an FPGA. My problem (spelled failure) is with the...
0
BLUEPANDA
by: BLUEPANDA | last post by:
At BluePanda Dev, we're passionate about building high-quality software and sharing our knowledge with the community. That's why we've created a SaaS...
0
by: Rahul1995seven | last post by:
Introduction: In the realm of programming languages, Python has emerged as a powerhouse. With its simplicity, versatility, and robustness, Python...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.