By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
437,890 Members | 1,039 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 437,890 IT Pros & Developers. It's quick & easy.

Does PHP send out corrupted string ? (charset issue)

Gulzor
P: 27
Hi,

I fetch web pages using Zend_Http (I send out POST data, fetch the results, and so on)
I have no problem with that.

I did a mb_detect_encoding() of the returned HTML and the function says it's UTF-8 encoded.

Parts of the returned HTML must be send back to the server. I store these parts into PHP strings.

The problem is that when I send back these PHP strings, all special characters (accents) are truncaded with garbages !

---

(-) The PHP script itself is saved in UTF-8.

(-) I tried to utf8_encode() the returned HTML before storing data into PHP strings

Do you have any tips ? Something trivial that I am missing ?

Thank you
Aug 28 '08 #1
Share this Question
Share on Google+
7 Replies


100+
P: 310
I don't know if this helps you at all, but I had an issue with characters when I migrated an application from one server to another. Suddenly I had on the new server difficulties in getting Danish language specific letters to print out correctly.

A tip on this forum led me to include this line in the header of my HTML output:

Expand|Select|Wrap|Line Numbers
  1. <meta http-equiv="content-type" content="text/html; charset=UTF-8">
  2.  
which then apparently specifies to the browser what to do. Apparently the configuration of the older server was such that this line was not necessary.

Again, maybe this is 100 miles away from your problem!
Aug 28 '08 #2

Gulzor
P: 27
I tried this when I used DOM::loadHTML to query the HTML for the data but the problem remains the same :

When I send back the data to the server (through HTTP POST), it seems that the data are corrupted.

Note that my script does not print out and executes on the command line.
Aug 28 '08 #3

Atli
Expert 5K+
P: 5,058
If you are sending this via a HTTP request, you may have to specify the charset in the Content-Type header. Like:
Expand|Select|Wrap|Line Numbers
  1. Content-Type: text/html; charset=utf-8
  2.  
Aug 28 '08 #4

Gulzor
P: 27
Of course ! Will do. Thank you.
Aug 29 '08 #5

Gulzor
P: 27
Didn't work. I really don't know what I can do...
Sep 1 '08 #6

pbmods
Expert 5K+
P: 5,821
Heya, Gulzor.

mb_detect_encoding() is very, very timid. It will almost always say 'UTF-8', even when the string is actually not.

Try this:

Expand|Select|Wrap|Line Numbers
  1. if( mb_detect_encoding($str . 'a', 'ISO-8859-1,UTF-8') != 'UTF-8' )
  2. {
  3.   utf8_encode($str);
  4. }
  5.  
For more info on why this works, check out my blog:
http://blog.pbmods.com/2008/07/01/fa...-utf-8-part-2/
Sep 1 '08 #7

Gulzor
P: 27
I tried but it still doesn't work.

mb_detect_encoding($str.'a', 'ISO-8859-1,UTF-8')

does not return the same value than

mb_detect_encoding($str.'a', 'UTF-8,ISO-8859-1')

When I output debug messages, it looks like that strings that I send back to the server and the strings returned from the server are the same...

Aaaargh !!! it is getting on my nerves
Sep 4 '08 #8

Post your reply

Sign in to post your reply or Sign up for a free account.