 | 
August 28th, 2008, 08:59 AM
|  | Newbie | | Join Date: Jul 2008 Location: Brussels, Belgium Age: 27
Posts: 26
| | Does PHP send out corrupted string ? (charset issue)
Hi,
I fetch web pages using Zend_Http (I send out POST data, fetch the results, and so on)
I have no problem with that.
I did a mb_detect_encoding() of the returned HTML and the function says it's UTF-8 encoded.
Parts of the returned HTML must be send back to the server. I store these parts into PHP strings.
The problem is that when I send back these PHP strings, all special characters (accents) are truncaded with garbages !
---
(-) The PHP script itself is saved in UTF-8.
(-) I tried to utf8_encode() the returned HTML before storing data into PHP strings
Do you have any tips ? Something trivial that I am missing ?
Thank you
| 
August 28th, 2008, 09:36 AM
| | Needs Regular Fix | | Join Date: Mar 2008
Posts: 305
| |
I don't know if this helps you at all, but I had an issue with characters when I migrated an application from one server to another. Suddenly I had on the new server difficulties in getting Danish language specific letters to print out correctly.
A tip on this forum led me to include this line in the header of my HTML output: -
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
-
which then apparently specifies to the browser what to do. Apparently the configuration of the older server was such that this line was not necessary.
Again, maybe this is 100 miles away from your problem!
| 
August 28th, 2008, 10:36 AM
|  | Newbie | | Join Date: Jul 2008 Location: Brussels, Belgium Age: 27
Posts: 26
| |
I tried this when I used DOM::loadHTML to query the HTML for the data but the problem remains the same :
When I send back the data to the server (through HTTP POST), it seems that the data are corrupted.
Note that my script does not print out and executes on the command line.
| 
August 28th, 2008, 10:30 PM
|  | Moderator | | Join Date: Nov 2006 Location: Iceland Age: 22
Posts: 2,777
| |
If you are sending this via a HTTP request, you may have to specify the charset in the Content-Type header. Like: -
Content-Type: text/html; charset=utf-8
-
| 
August 29th, 2008, 10:04 AM
|  | Newbie | | Join Date: Jul 2008 Location: Brussels, Belgium Age: 27
Posts: 26
| |
Of course ! Will do. Thank you.
| 
September 1st, 2008, 08:20 AM
|  | Newbie | | Join Date: Jul 2008 Location: Brussels, Belgium Age: 27
Posts: 26
| |
Didn't work. I really don't know what I can do...
| 
September 1st, 2008, 03:42 PM
|  | Site Moderator | | Join Date: Apr 2007 Location: Texas Age: 24
Posts: 5,306
| |
Heya, Gulzor.
mb_detect_encoding() is very, very timid. It will almost always say 'UTF-8', even when the string is actually not.
Try this: -
if( mb_detect_encoding($str . 'a', 'ISO-8859-1,UTF-8') != 'UTF-8' )
-
{
-
utf8_encode($str);
-
}
-
For more info on why this works, check out my blog: http://blog.pbmods.com/2008/07/01/fa...-utf-8-part-2/ | 
September 4th, 2008, 11:13 AM
|  | Newbie | | Join Date: Jul 2008 Location: Brussels, Belgium Age: 27
Posts: 26
| |
I tried but it still doesn't work.
mb_detect_encoding($str.'a', 'ISO-8859-1,UTF-8')
does not return the same value than
mb_detect_encoding($str.'a', 'UTF-8,ISO-8859-1')
When I output debug messages, it looks like that strings that I send back to the server and the strings returned from the server are the same...
Aaaargh !!! it is getting on my nerves
|  |
Posting Rules
| You may not post new threads You may not post replies You may not post attachments You may not edit your posts HTML code is Off | | | | | | What is Bytes?
We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights.
Get the best answers to your questions from over network members.
|