Connecting Tech Pros Worldwide Help | Site Map

Does PHP send out corrupted string ? (charset issue)

Gulzor's Avatar
Newbie
 
Join Date: Jul 2008
Location: Brussels, Belgium
Posts: 27
#1: Aug 28 '08
Hi,

I fetch web pages using Zend_Http (I send out POST data, fetch the results, and so on)
I have no problem with that.

I did a mb_detect_encoding() of the returned HTML and the function says it's UTF-8 encoded.

Parts of the returned HTML must be send back to the server. I store these parts into PHP strings.

The problem is that when I send back these PHP strings, all special characters (accents) are truncaded with garbages !

---

(-) The PHP script itself is saved in UTF-8.

(-) I tried to utf8_encode() the returned HTML before storing data into PHP strings

Do you have any tips ? Something trivial that I am missing ?

Thank you
Needs Regular Fix
 
Join Date: Mar 2008
Posts: 311
#2: Aug 28 '08

re: Does PHP send out corrupted string ? (charset issue)


I don't know if this helps you at all, but I had an issue with characters when I migrated an application from one server to another. Suddenly I had on the new server difficulties in getting Danish language specific letters to print out correctly.

A tip on this forum led me to include this line in the header of my HTML output:

Expand|Select|Wrap|Line Numbers
  1. <meta http-equiv="content-type" content="text/html; charset=UTF-8">
  2.  
which then apparently specifies to the browser what to do. Apparently the configuration of the older server was such that this line was not necessary.

Again, maybe this is 100 miles away from your problem!
Gulzor's Avatar
Newbie
 
Join Date: Jul 2008
Location: Brussels, Belgium
Posts: 27
#3: Aug 28 '08

re: Does PHP send out corrupted string ? (charset issue)


I tried this when I used DOM::loadHTML to query the HTML for the data but the problem remains the same :

When I send back the data to the server (through HTTP POST), it seems that the data are corrupted.

Note that my script does not print out and executes on the command line.
Atli's Avatar
Moderator
 
Join Date: Nov 2006
Location: Iceland
Posts: 3,745
#4: Aug 28 '08

re: Does PHP send out corrupted string ? (charset issue)


If you are sending this via a HTTP request, you may have to specify the charset in the Content-Type header. Like:
Expand|Select|Wrap|Line Numbers
  1. Content-Type: text/html; charset=utf-8
  2.  
Gulzor's Avatar
Newbie
 
Join Date: Jul 2008
Location: Brussels, Belgium
Posts: 27
#5: Aug 29 '08

re: Does PHP send out corrupted string ? (charset issue)


Of course ! Will do. Thank you.
Gulzor's Avatar
Newbie
 
Join Date: Jul 2008
Location: Brussels, Belgium
Posts: 27
#6: Sep 1 '08

re: Does PHP send out corrupted string ? (charset issue)


Didn't work. I really don't know what I can do...
pbmods's Avatar
Site Moderator
 
Join Date: Apr 2007
Location: Texas
Posts: 5,435
#7: Sep 1 '08

re: Does PHP send out corrupted string ? (charset issue)


Heya, Gulzor.

mb_detect_encoding() is very, very timid. It will almost always say 'UTF-8', even when the string is actually not.

Try this:

Expand|Select|Wrap|Line Numbers
  1. if( mb_detect_encoding($str . 'a', 'ISO-8859-1,UTF-8') != 'UTF-8' )
  2. {
  3.   utf8_encode($str);
  4. }
  5.  
For more info on why this works, check out my blog:
http://blog.pbmods.com/2008/07/01/fa...-utf-8-part-2/
Gulzor's Avatar
Newbie
 
Join Date: Jul 2008
Location: Brussels, Belgium
Posts: 27
#8: Sep 4 '08

re: Does PHP send out corrupted string ? (charset issue)


I tried but it still doesn't work.

mb_detect_encoding($str.'a', 'ISO-8859-1,UTF-8')

does not return the same value than

mb_detect_encoding($str.'a', 'UTF-8,ISO-8859-1')

When I output debug messages, it looks like that strings that I send back to the server and the strings returned from the server are the same...

Aaaargh !!! it is getting on my nerves
Reply