vijay wrote:
Quote:
When I do a readfile or file_get_contents on a web page the string I
get back get corrupted for non-ASCII characters. For instance when do
a readfile("http://abc/def") "São Paulo" became "São Paulo" on the
calling page although
http://abc/def shows "São Paulo" correctly. Any
idea on how to fix this problem.
>
Let me try to explain it more. I have two pages
http://abc/def,
http://abc/ghi.php and I am trying to read the contents of
http://abc/def
from
http://abc/ghi.php.
What you get is exactly right. From your example, it appears that your
text is utf-8 encoded and that the second page is (probably) latin-1
encoded. A "readfile" without respecting any encodings is not enough to
display "human" text.
If you use curl, you can catch the headers that contain the encoding
used and use mbstring to convert it. Or if it is always the same page
you read, you know the encoding beforehand.
Best regards.
--
Willem Bogaerts
Application smith
Kratz B.V.
http://www.kratz.nl/