Help | Site Map
Connecting Tech Pros Worldwide
Reply
 
LinkBack Thread Tools
  #1  
Old August 28th, 2008, 08:59 AM
Gulzor's Avatar
Newbie
 
Join Date: Jul 2008
Location: Brussels, Belgium
Age: 27
Posts: 26
Default Does PHP send out corrupted string ? (charset issue)

Hi,

I fetch web pages using Zend_Http (I send out POST data, fetch the results, and so on)
I have no problem with that.

I did a mb_detect_encoding() of the returned HTML and the function says it's UTF-8 encoded.

Parts of the returned HTML must be send back to the server. I store these parts into PHP strings.

The problem is that when I send back these PHP strings, all special characters (accents) are truncaded with garbages !

---

(-) The PHP script itself is saved in UTF-8.

(-) I tried to utf8_encode() the returned HTML before storing data into PHP strings

Do you have any tips ? Something trivial that I am missing ?

Thank you
Reply
  #2  
Old August 28th, 2008, 09:36 AM
Needs Regular Fix
 
Join Date: Mar 2008
Posts: 305
Default

I don't know if this helps you at all, but I had an issue with characters when I migrated an application from one server to another. Suddenly I had on the new server difficulties in getting Danish language specific letters to print out correctly.

A tip on this forum led me to include this line in the header of my HTML output:

Expand|Select|Wrap|Line Numbers
  1. <meta http-equiv="content-type" content="text/html; charset=UTF-8">
  2.  
which then apparently specifies to the browser what to do. Apparently the configuration of the older server was such that this line was not necessary.

Again, maybe this is 100 miles away from your problem!
Reply
  #3  
Old August 28th, 2008, 10:36 AM
Gulzor's Avatar
Newbie
 
Join Date: Jul 2008
Location: Brussels, Belgium
Age: 27
Posts: 26
Default

I tried this when I used DOM::loadHTML to query the HTML for the data but the problem remains the same :

When I send back the data to the server (through HTTP POST), it seems that the data are corrupted.

Note that my script does not print out and executes on the command line.
Reply
  #4  
Old August 28th, 2008, 10:30 PM
Atli's Avatar
Moderator
 
Join Date: Nov 2006
Location: Iceland
Age: 22
Posts: 2,777
Default

If you are sending this via a HTTP request, you may have to specify the charset in the Content-Type header. Like:
Expand|Select|Wrap|Line Numbers
  1. Content-Type: text/html; charset=utf-8
  2.  
Reply
  #5  
Old August 29th, 2008, 10:04 AM
Gulzor's Avatar
Newbie
 
Join Date: Jul 2008
Location: Brussels, Belgium
Age: 27
Posts: 26
Default

Of course ! Will do. Thank you.
Reply
  #6  
Old September 1st, 2008, 08:20 AM
Gulzor's Avatar
Newbie
 
Join Date: Jul 2008
Location: Brussels, Belgium
Age: 27
Posts: 26
Default

Didn't work. I really don't know what I can do...
Reply
  #7  
Old September 1st, 2008, 03:42 PM
pbmods's Avatar
Site Moderator
 
Join Date: Apr 2007
Location: Texas
Age: 24
Posts: 5,306
Default

Heya, Gulzor.

mb_detect_encoding() is very, very timid. It will almost always say 'UTF-8', even when the string is actually not.

Try this:

Expand|Select|Wrap|Line Numbers
  1. if( mb_detect_encoding($str . 'a', 'ISO-8859-1,UTF-8') != 'UTF-8' )
  2. {
  3.   utf8_encode($str);
  4. }
  5.  
For more info on why this works, check out my blog:
http://blog.pbmods.com/2008/07/01/fa...-utf-8-part-2/
Reply
  #8  
Old September 4th, 2008, 11:13 AM
Gulzor's Avatar
Newbie
 
Join Date: Jul 2008
Location: Brussels, Belgium
Age: 27
Posts: 26
Default

I tried but it still doesn't work.

mb_detect_encoding($str.'a', 'ISO-8859-1,UTF-8')

does not return the same value than

mb_detect_encoding($str.'a', 'UTF-8,ISO-8859-1')

When I output debug messages, it looks like that strings that I send back to the server and the strings returned from the server are the same...

Aaaargh !!! it is getting on my nerves
Reply
Reply

Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over network members.
Post your question now . . .
It's fast and it's free

Popular Articles