473,889 Members | 1,272 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

UTF-8 encoding decoding not working with Danish characters

Hi all,
I am new to XML, but I use it for an RSS feed.

I have one problem, which I have really been struggling with.

My XML document is generated from the contents of a MySQL database. It is
UTF-8 encoded.

However, the Danish special characters appear wrong.

For example the letter becomes "å", the letter becomes "ø"

See an examle here:
http://netm.dk/blog/rss/index_rss2.xml

I thought that it could be because the encoding was not set in the document,
so I added this:
<?xml version="1.0" encoding="UTF-8" ?>
However, that did not make any difference, as can be seen here:
http://netm.dk/blog/rss/test_rss2.xml

The text decodes correctly on my regular web pages on http://netm.dk/

What am I doing wrong?

Regards,
Lars
www.netm.dk


Jul 20 '05 #1
18 17554
LarsM wrote:
Hi all,
I am new to XML, but I use it for an RSS feed.

I have one problem, which I have really been struggling with.

My XML document is generated from the contents of a MySQL database. It is
UTF-8 encoded.

However, the Danish special characters appear wrong.

For example the letter becomes "å", the letter becomes "ø"

See an examle here:
http://netm.dk/blog/rss/index_rss2.xml

I thought that it could be because the encoding was not set in the document,
so I added this:
<?xml version="1.0" encoding="UTF-8" ?>
However, that did not make any difference, as can be seen here:
http://netm.dk/blog/rss/test_rss2.xml

The text decodes correctly on my regular web pages on http://netm.dk/

What am I doing wrong?

Regards,
Lars
www.netm.dk

This is not limited to XML. I try to send JavaMail mails. When doing
this from a Windows PC, Danish characters are garbled, when running the
exact same program on Linux, the characters get through fine.

Hope we get rid of thos %@ darned NLS issues sometime in my lifetime,
but I doubt it.
Jul 20 '05 #2
LarsM wrote:
My XML document is generated from the contents of a MySQL database. It is
UTF-8 encoded.
You have to take care that *every* tool in the toolchain
knows how to handle utf-8 correctly. Maybe you give us
a list of tools involved ?
The text decodes correctly on my regular web pages on http://netm.dk/


Your web page looks OK to me.
I bet it is in the database or shortly thereafter.
Jul 20 '05 #3

"Jrgen Kahrs" wrote:

Maybe you give us a list of tools involved ?


Thanks Jrgen,
The RSS feed is being generated by the same Blog application
("Boastmachine" ), which I use to generate the Web pages. As far as I know it
accesses the database in the same way as for the "real" pages.
But I will check up on that.
-Lars


Jul 20 '05 #4
LarsM wrote:
The RSS feed is being generated by the same Blog application
("Boastmachine" ), which I use to generate the Web pages. As far as I know it
accesses the database in the same way as for the "real" pages.
So the problem should be in the Blog application.
But I will check up on that.


Good idea. Maybe there is simply a bug in the RSS
extraction mechanism.
Jul 20 '05 #5
LarsM wrote:
Hi all,
I am new to XML, but I use it for an RSS feed.

I have one problem, which I have really been struggling with.

My XML document is generated from the contents of a MySQL database. It is
UTF-8 encoded.

However, the Danish special characters appear wrong.

For example the letter becomes "å", the letter becomes "ø"

See an examle here:
http://netm.dk/blog/rss/index_rss2.xml

I thought that it could be because the encoding was not set in the document,
so I added this:
<?xml version="1.0" encoding="UTF-8" ?>
However, that did not make any difference, as can be seen here:
http://netm.dk/blog/rss/test_rss2.xml

The text decodes correctly on my regular web pages on http://netm.dk/

What am I doing wrong?

Regards,
Lars
www.netm.dk

Pointing my (Linux) Firefox browser at your web site, and having
encoding set to utf-8, I see you page fine. Setting encoding to
ISO-8859-1 generates the å stuff. One never knows how the users'
browsers are setup.

Look at this page: www.vietbao.com

Great looking, authentic, Vietnamese fonts with utf-8. Obviously not
looking good with iso (vn fonts not part of iso..).
Jul 20 '05 #6

"Malte" wrote:
Pointing my (Linux) Firefox browser at your web site, and having encoding
set to utf-8, I see you page fine. Setting encoding to ISO-8859-1
generates the å stuff. One never knows how the users' browsers are setup.


Is that looking at http://netm.dk/blog/rss/index_rss2.xml also?

-Lars
Jul 20 '05 #7
LarsM wrote:
Hi all,
I am new to XML, but I use it for an RSS feed.

I have one problem, which I have really been struggling with.

My XML document is generated from the contents of a MySQL database. It is
UTF-8 encoded.


No. It's ASCII encoded before an agent even looks at the document itself.
See RFC3023 for details.

The good news is that the fix is a single line in httpd.conf.

--
Nick Kew
Jul 20 '05 #8
/LarsM/:
My XML document is generated from the contents of a MySQL database. It is
UTF-8 encoded.

However, the Danish special characters appear wrong.

For example the letter becomes "å", the letter becomes "ø"

See an examle here:
http://netm.dk/blog/rss/index_rss2.xml


Sound like an MySQL configuration issue, to me.

--
Stanimir
Jul 20 '05 #9

"Nick Kew" wrote:

The good news is that the fix is a single line in httpd.conf.


I don't have my own Apache server, but am using an ISP (Freepaq.dk). Where
can I make the configuration change, then?

-Lars
Jul 20 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
15023
by: JTMar | last post by:
I need to save a file in UTF-8 format from an ASP file. Using FileSystemObject I can only save them in Unicode (UTF-16, I think) but not in UTF-8: Set zzzz = fso.OpenTextFile(filepath, 2, true, -1) How do I can save my file in UTF-8 format ? I have tried to convert the file to UTF-8, with this code:
38
5763
by: Haines Brown | last post by:
I'm having trouble finding the character entity for the French abbreviation for "number" (capital N followed by a small supercript o, period). My references are not listing it. Where would I find an answer to this question (don't find it in the W3C_char_entities document). -- Haines Brown brownh@hartford-hwp.com
16
6184
by: lawrence | last post by:
I was told in another newsgroup (about XML, I was wondering how to control user input) that most modern browsers empower the designer to cast the user created input to a particular character encoding. This arose in answer to my question about how to control user input. I had complained that I had users who wrote articles in Microsoft Word or WordPerfect and then input that to the web through a textarea box on a form I'd created. I've...
22
11971
by: Martin Trautmann | last post by:
Hi all, is there any kind of 'hiconv' or other (unix-like) conversion tool that would convert UTF-8 to HTML (ISO-Latin-1 and Unicode)? The database output is UTF-8 or UTF-16 only - Thus almost every character starts with ^@. I've seen e.g. http://aktuell.de.selfhtml.org/artikel/javascript/utf8b64/utf8.htm#a5 as
1
15630
by: stevelooking41 | last post by:
Can someone explain why I don't seem unable to use document.write to produce a valid UTF-8 none breaking space sequence (Hex: C2A0) ? I've tried everyway I've been able to find to tell the browser I'm trying to print UTF-8 and still no luck. I'd like the first 2 tries to match the second two tries as far as output. <HTML> <meta http-equiv="Content-Type" content="application/x-script; charset=UTF-8">
4
6922
by: Cott Lang | last post by:
ERROR: could not convert UTF-8 character 0x00ef to ISO8859-1 Running 7.4.5, I frequently get this error, and ONLY on this particular character despite seeing quite a bit of 8 bit. I don't really follow why it can't be converted, it's the same character (239) in both character sets. Databases are in ISO8859-1, JDBC driver is defaulting to UTF-8. Am I flubbing something up? I'm probably going to (reluctantly) convert to UTF-8 in the...
5
6024
by: davihigh | last post by:
Hi Friends: fileObj = codecs.open( filename, "r", "utf-8" ) u = fileObj.read() # Returns a Unicode string from the UTF-8 bytes in the file print u It says error: UnicodeEncodeError: 'gbk' codec can't encode character u'\ufeff' in position 0:
12
9916
by: Rafał Maj Raf256 | last post by:
Hi, I have an UNICODE text file endcoded in UTF-8. I should store the UNICODE strings in my program for example in std::wstring right? To be able to work on them normally, so that std::wstring foo; foo would mean 5-th _character_, and not 5-th byte of UNICODE encoded string. How do I read a text from UTF-8 file into std::wstring? I need to do some conversion right? from utf-8 to internal format used by
7
12169
by: Jimmy Shaw | last post by:
Hi everybody, Is there any SIMPLE way to convert from UTF-16 to UTF-32? I may be mixed up, but is it possible that all UTF-16 "code points" that are 16 bits long appear just the same in UTF-32, but with zero padding and hence no real conversion is necessary? If I am completely wrong and some intricate conversion operation needs to take place, can anyone give me some primer on the subject?
8
7038
by: Siegfried Heintze | last post by:
The following perl program works when I run it from urxvt-X console on cygwin-x windows LC_CTYPE=en_US.UTF-8 urxvt-X.exe& perl -wle "binmode STDOUT, q; print chr() for 0x410 .. 0x430;" This little one liner prints the Russian alphabet in Cryllic. With some slight modification it will also print a lot of other alphabets too -- including Hebrew, chinese and japanese.
0
11187
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10783
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10887
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10439
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7991
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupr who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5825
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
6021
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4642
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
4248
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.