473,462 Members | 1,333 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Charset decoding problem

Dormilich
8,658 Expert Mod 8TB
Hi,

I've got a very strange problem with UTF-8 encoded data outside ASCII range.

While on localhost all went smoothly, the same pages on the server show � (Latin-1 chars (ä, ö, ü, ß, ...)) and ? (above Latin-1 range (typographics)). Even the support does not really have a clue (that could help me).

reference: http://test.kulturbeutel-leipzig.net/main.php?f=presse

Javascript on – all works fine (data are fetched directly from a MySQL DB via AJAX)
Javascript off – (a bit more complicated) data are fetched from DB (stored there as WDDX serialized data) and deserialized into an object, which in turn is responsible for output.

maybe there's some problem with the deserialization.....

Does anyone have an idea, how I can find out the source of the problem?

thanks

PS: the DB should contain the same data, because I used a SQL dump of one to build the other.

PPS: if you need class definitions, just ask (it would be too much to list all incorporated classes at once)

local system: Darwin Melchior 9.6.0 Darwin Kernel Version 9.6.0: Mon Nov 24 17:37:00 PST 2008; root:xnu-1228.9.59~1/RELEASE_I386 i386 / PHP 5.2.8.
(= Mac OS 10.5)

public system: Linux Custom Build 64 Bit prohost.de XEON SMP x86_64 (Red Hat Enterprise Linux) / PHP 5.2.6.
Feb 2 '09 #1
7 2400
Atli
5,058 Expert 4TB
Hi.

I don't really know much about WDDX, but as I understand it, it is basically XML?
I had similar problems when passing XML files around a while ago, where the server was sending stuff as Unicode, the browser was rendering using Unicode, but the output was all mangled.

Turned out all I had to do to fix this was add:
Expand|Select|Wrap|Line Numbers
  1. <?xml version="1.0" encoding="UTF-8" ?>
And everybody suddenly started understanding each other.

My mistake was to assume that the XML file would adopt the charset passed with a Content-Type header like HTML pages do.

Perhaps you left this out as well?
Feb 3 '09 #2
Dormilich
8,658 Expert Mod 8TB
yepp, WDDX is XML (useful if you have your configuration stored as XML)

but the XML header was there from the start.... and obviously Javascript has no problems at all with it.

sample WDDX:
Expand|Select|Wrap|Line Numbers
  1. <?xml version="1.0" encoding="UTF-8" ?>
  2. <wddxPacket version='1.0'>
  3.   <header>
  4.     <comment>Zeitungsausschnitte (Text)</comment>
  5.   </header>
  6.   <data>
  7.     <array length='4'>
  8.       <string>Helena – von Äpfeln, Göttern und anderen Helden</string>
  9.       <struct>
  10.         <var name='php_class_name'>
  11.           <string>wddx_presse</string>
  12.         </var>
  13.         <var name='name'>
  14.           <string>p</string>
  15.         </var>
  16.         <var name='content'>
  17.           <string>Auch 2004 erfreut die Schau*spiel*gruppe „Kultur*beutel“ wieder […]</string>
  18.         </var>
  19. […]
  20.       </struct>
  21.     </array>
  22.   </data>
  23. </wddxPacket>
note * = soft hyphen (escaped by bytes' editor)
Feb 3 '09 #3
Dormilich
8,658 Expert Mod 8TB
there seems to be something wrong with the deserializer, after some testing I can say the problems occur right after deserialization.

does anyone know, how I can determine the encoding/charset of a variable content? (that would be interesting to know)

thanks
Feb 3 '09 #4
Atli
5,058 Expert 4TB
PHP strings (until version 6) don't have any native support for Unicode, or any other charset for that matter.
A string character is essentially the same as a byte.

Try running the variable content through utf8_encode. See if that helps any.
Feb 3 '09 #5
Dormilich
8,658 Expert Mod 8TB
@Atli
Though it converts the Latin-1 characters, it's no help with the characters initially showing up as '?' („ “ – ’ … and the like)
Feb 3 '09 #6
Dormilich
8,658 Expert Mod 8TB
finally got the problem somehow sorted by converting all non-ascii characters using unicode entities and this little function: http://de2.php.net/manual/de/functio...code.php#75941
Feb 13 '09 #7
xaxis
15
@Dormilich
Very interesting indeed. Interesting enough that I scoured the net and I believe this resource: http://www.mozilla.org/projects/intl...Detection.html to be the most detailed and closest any person/group has yet come to solving this extremely challenging problem.
Feb 13 '09 #8

Sign in to post your reply or Sign up for a free account.

Similar topics

2
by: bobs | last post by:
Hello, I was wondering if some one could take a look at the two example server outputs below, and help me understand what is wrong. I'm getting garbled characters. The php script I'm developing...
1
by: Thomas Williams | last post by:
Hello everyone, my name is Tom W. And, I am new to the list, and have been using Python for about a year now. Anyway, I got a question! I am trying to decode MIME (base64) email from a POP3...
4
by: Pavils Jurjans | last post by:
Hallo, I am working on multilingual web-application, and I have to be very sure about how the international characters are encoded and decoded in the client-server form requests. There's a...
0
by: Mark Rodrigues | last post by:
Hi everyone, Today we hit a problem when we rolled a new web-site are were presented with a error on our page that looked like: Security Exception Description: The application attempted to...
0
by: Johann Blake | last post by:
In my need to decode a JPEG 2000 file, I discovered like many that there was no functionality for this in the .NET Framework. Instead of forking out a pile of cash to do this, I came up with the...
1
by: Slade | last post by:
Hi, I'm trying to use POST an image to a web page with WebRequest/WebResponse. Only problem is that I must be making an error somewhere in the encoding/decoding process. I've pasted below a bit...
5
by: Peter Jansson | last post by:
Hello group, The following code is an attempt to perform URL-decoding of URL-encoded string. Note that std::istringstream is used within the switch, within the loop. Three main issues have been...
25
by: marcin.rzeznicki | last post by:
Hello everyone I've got a little problem with choosing the best decoding strategy for some nasty problem. I have to deal with very large files wich contain text encoded with various encodings....
0
by: mubx2000 | last post by:
I Did Some Application (In Symbian C++) To Get Email Message from POP3 Server Using Sockets Engine, The Message Containes (Arabic) Language Text ,Inside the message body It's Charset should be...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.