Connecting Tech Pros Worldwide Forums | Help | Site Map

Multibyte character?

vijay
Guest
 
Posts: n/a
#1: Jun 2 '08
When I do a readfile or file_get_contents on a web page the string I
get back get corrupted for non-ASCII characters. For instance when do
a readfile("http://abc/def") "São Paulo" became "São Paulo" on the
calling page although http://abc/def shows "São Paulo" correctly. Any
idea on how to fix this problem.

Let me try to explain it more. I have two pages http://abc/def,
http://abc/ghi.php and I am trying to read the contents of http://abc/def
from http://abc/ghi.php.

Willem Bogaerts
Guest
 
Posts: n/a
#2: Jun 2 '08

re: Multibyte character?


vijay wrote:
Quote:
When I do a readfile or file_get_contents on a web page the string I
get back get corrupted for non-ASCII characters. For instance when do
a readfile("http://abc/def") "São Paulo" became "São Paulo" on the
calling page although http://abc/def shows "São Paulo" correctly. Any
idea on how to fix this problem.
>
Let me try to explain it more. I have two pages http://abc/def,
http://abc/ghi.php and I am trying to read the contents of http://abc/def
from http://abc/ghi.php.
What you get is exactly right. From your example, it appears that your
text is utf-8 encoded and that the second page is (probably) latin-1
encoded. A "readfile" without respecting any encodings is not enough to
display "human" text.

If you use curl, you can catch the headers that contain the encoding
used and use mbstring to convert it. Or if it is always the same page
you read, you know the encoding beforehand.

Best regards.
--
Willem Bogaerts

Application smith
Kratz B.V.
http://www.kratz.nl/
Closed Thread