473,383 Members | 1,821 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,383 software developers and data experts.

UTF8: file_put_contents doesn't seem to write UTF8 content properly

Hi,

I'm trying to let PHP write a 'sitemap.xml' sitemap for Google and other
searchengines. It's working, except that the content in the XML file doesn't
seem to be UTF8. (Which it should be, judging by the information given on
Google's webmaster helpcenter).

The way I test to see if the content is UTF8, is by opening the XML file in
notepad and choose 'save as...'. Normally the coding option should be set to
UTF8, but now it just shows ANSI.

This is what I have tried to write UTF8 content with:

file_put_contents( '.' . SITEMAP_FILE, utf8_encode(
$this->sitemapForCrawlers ) );
....and...
file_put_contents( '.' . SITEMAP_FILE, iconv( "ISO-8859-1", "UTF8",
$this->sitemapForCrawlers ) );

....where...
SITEMAP_FILE is the filename constant
....and...
$this->sitemapForCrawlers is the string with XML data

With the last attempt I even got an error saying:

Wrong charset, conversion from `ISO-8859-1' to `UTF8' is not allowed in...
Any adeas of how I can make this work?

Thanks for the input.
Jun 13 '07 #1
7 12978
On Wed, 13 Jun 2007 22:25:44 +0200, "amygdala" <no*****@noreply.comwrote:
>I'm trying to let PHP write a 'sitemap.xml' sitemap for Google and other
searchengines. It's working, except that the content in the XML file doesn't
seem to be UTF8. (Which it should be, judging by the information given on
Google's webmaster helpcenter).

The way I test to see if the content is UTF8, is by opening the XML file in
notepad and choose 'save as...'. Normally the coding option should be set to
UTF8, but now it just shows ANSI.
Well, that's not a foolproof method...
>This is what I have tried to write UTF8 content with:

file_put_contents( '.' . SITEMAP_FILE, utf8_encode(
$this->sitemapForCrawlers ) );
...and...
file_put_contents( '.' . SITEMAP_FILE, iconv( "ISO-8859-1", "UTF8",
$this->sitemapForCrawlers ) );

...where...
SITEMAP_FILE is the filename constant
...and...
$this->sitemapForCrawlers is the string with XML data

With the last attempt I even got an error saying:

Wrong charset, conversion from `ISO-8859-1' to `UTF8' is not allowed in...

Any adeas of how I can make this work?
Start from the beginning; what character set encoding is the original data in?
The error implies that it's not ISO-8859-1 (which does have some gaps where
characters aren't valid...)

--
Andy Hassall :: an**@andyh.co.uk :: http://www.andyh.co.uk
http://www.andyhsoftware.co.uk/space :: disk and FTP usage analysis tool
Jun 13 '07 #2
C.
On 13 Jun, 21:25, "amygdala" <nore...@noreply.comwrote:
>
The way I test to see if the content is UTF8, is by opening the XML file in
notepad and choose 'save as...'. Normally the coding option should be set to
UTF8, but now it just shows ANSI.
ROFL.

Try inserting a BOM in front of the content.

C.

Jun 13 '07 #3

"Andy Hassall" <an**@andyh.co.ukschreef in bericht
news:nb********************************@4ax.com...
On Wed, 13 Jun 2007 22:25:44 +0200, "amygdala" <no*****@noreply.com>
wrote:
>>I'm trying to let PHP write a 'sitemap.xml' sitemap for Google and other
searchengines. It's working, except that the content in the XML file
doesn't
seem to be UTF8. (Which it should be, judging by the information given on
Google's webmaster helpcenter).

The way I test to see if the content is UTF8, is by opening the XML file
in
notepad and choose 'save as...'. Normally the coding option should be set
to
UTF8, but now it just shows ANSI.

Well, that's not a foolproof method...

I was afraid of that.

>>This is what I have tried to write UTF8 content with:

file_put_contents( '.' . SITEMAP_FILE, utf8_encode(
$this->sitemapForCrawlers ) );
...and...
file_put_contents( '.' . SITEMAP_FILE, iconv( "ISO-8859-1", "UTF8",
$this->sitemapForCrawlers ) );

...where...
SITEMAP_FILE is the filename constant
...and...
$this->sitemapForCrawlers is the string with XML data

With the last attempt I even got an error saying:

Wrong charset, conversion from `ISO-8859-1' to `UTF8' is not allowed in...

Any adeas of how I can make this work?

Start from the beginning; what character set encoding is the original data
in?
The error implies that it's not ISO-8859-1 (which does have some gaps
where
characters aren't valid...)
Well... I discovered the 'Set Code Page...' option in UltraEdit, the main
editor I use to code PHP. And it tells me my PHP code files are encoded in
'1252 (ANSI - Latin I)'. So, now my next question is... what would be the
correct first parameter for the iconv function to tell it that the original
data is '1252 (ANSI - Latin I)'. I've tried numerous stings, which include:

'1252 (ANSI - Latin I)'
'1252'
'1252 ANSI'
'1252-ANSI'
'ANSI-1252'
'ANSI 1252'

....and variations.

Is there any iconv encoding table with acceptable encodings I can consult?
Also, isn't '1252 (ANSI - Latin I)' just a pimped version of ISO-8859-1?

Although I'm still curious of this. Please read my reply to C. also.

Thanks.
Jun 14 '07 #4

"C." <co************@gmail.comschreef in bericht
news:11**********************@e9g2000prf.googlegro ups.com...
On 13 Jun, 21:25, "amygdala" <nore...@noreply.comwrote:
>>
The way I test to see if the content is UTF8, is by opening the XML file
in
notepad and choose 'save as...'. Normally the coding option should be set
to
UTF8, but now it just shows ANSI.

ROFL.
Yes, very amusing. :-/
Try inserting a BOM in front of the content.
Ok, I did a little research on BOM. And came up with information that tells
me the BOM isn't particularly necessary for UTF-8. Then I ran a simple test
with utf8_encode:

<?php
echo utf8_encode( 'ï' );
?>

Which output looks like it works just fine:

ï

Since I've concluded (see my reply to Andy Hassall) that my files are
encoded in '1252 (ANSI - Latijn I)' and this test file was also '1252
(ANSI - Latijn I)' I guess it works. And I don't necessary have to provide a
BOM and don't have to resort to iconv. Correct?

Or am I missing something vital here?

Thanks.
Jun 14 '07 #5
On Thu, 14 Jun 2007 03:39:34 +0200, "amygdala" <no*****@noreply.comwrote:
>Start from the beginning; what character set encoding is the original data
in?
The error implies that it's not ISO-8859-1 (which does have some gaps
where
characters aren't valid...)

Well... I discovered the 'Set Code Page...' option in UltraEdit, the main
editor I use to code PHP. And it tells me my PHP code files are encoded in
'1252 (ANSI - Latin I)'.
Well... again, that's not foolproof. It's generally not possible to
definitively detect the encoding of a file. You can work out whether it's
impossible to be in a particular encoding (invalid characters or byte
sequences), and you can make some guesses on character distribution or
spellings of words, but unless it's tagged in some way (like HTML and XML, or
through another channel like HTTP headers) then it's not certain.

"Windows Codepage 1252" is a Windows character set encoding that is similar,
but not exactly the same as ISO-8859-1. It (1252) differs on the location of
the Euro character, and has a few extra characters in a range that is reserved
in ISO-8859-1.

Do you have any Euro currency symbols in the file?
>So, now my next question is... what would be the
correct first parameter for the iconv function to tell it that the original
data is '1252 (ANSI - Latin I)'. I've tried numerous stings, which include:

'1252 (ANSI - Latin I)'
'1252'
'1252 ANSI'
'1252-ANSI'
'ANSI-1252'
'ANSI 1252'

...and variations.

Is there any iconv encoding table with acceptable encodings I can consult?
http://www.gnu.org/software/libiconv/

You possibly want:

CP1252
>Also, isn't '1252 (ANSI - Latin I)' just a pimped version of ISO-8859-1?
I should read the entire message before typing ;-)

--
Andy Hassall :: an**@andyh.co.uk :: http://www.andyh.co.uk
http://www.andyhsoftware.co.uk/space :: disk and FTP usage analysis tool
Jun 14 '07 #6
I'm trying to let PHP write a 'sitemap.xml' sitemap for Google and other
searchengines. It's working, except that the content in the XML file doesn't
seem to be UTF8. (Which it should be, judging by the information given on
Google's webmaster helpcenter).
How can you tell? YOU tell the system what encoding is used. The system
rarely tells you, as bytes can be perfectly valid text in a lot of
encodings and look very different in each of them.

Even if the system tells you, it usually does so separately from the
text itself. Which is obvious, because you need the encoding to be able
to read the text! In webpages and e-mail, for example, headers are used
to set the encoding of the data.

I suggest you search the net for encodings and how to work with them.
This is a good start:

http://www.joelonsoftware.com/articles/Unicode.html

Good luck with the onions,
--
Willem Bogaerts

Application smith
Kratz B.V.
http://www.kratz.nl/
Jun 15 '07 #7

"Willem Bogaerts" <w.********@kratz.maardanzonderditstuk.nlschreef in
bericht news:46*********************@news.xs4all.nl...
>I'm trying to let PHP write a 'sitemap.xml' sitemap for Google and other
searchengines. It's working, except that the content in the XML file
doesn't
seem to be UTF8. (Which it should be, judging by the information given on
Google's webmaster helpcenter).

How can you tell? YOU tell the system what encoding is used. The system
rarely tells you, as bytes can be perfectly valid text in a lot of
encodings and look very different in each of them.
Yes, you're right of course, what was I thinking.
Even if the system tells you, it usually does so separately from the
text itself. Which is obvious, because you need the encoding to be able
to read the text! In webpages and e-mail, for example, headers are used
to set the encoding of the data.

I suggest you search the net for encodings and how to work with them.
This is a good start:

http://www.joelonsoftware.com/articles/Unicode.html
Great article. Thanks for the pointer.
Good luck with the onions,
Hopefully that won't be necessary anymore.
Cheers.
--
Willem Bogaerts

Application smith
Kratz B.V.
http://www.kratz.nl/

Jun 16 '07 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Michael Preminger | last post by:
Hello! We are running php 4.3.10 Therefore I need to use php_put_contents through pear. Thats what my file looks like: <?php require_once 'php/Compat/Function/file_put_contents.php';
5
by: Richard Lewis | last post by:
Hi there, I'm having a problem with unicode files and ftplib (using Python 2.3.5). I've got this code: xml_source = codecs.open("foo.xml", 'w+b', "utf8") #xml_source = file("foo.xml",...
12
by: Chris Mullins | last post by:
I'm implementing RFC 3491 in .NET, and running into a strange issue. Step 1 of RFC 3491 is performing a set of mappings dicated by tables B.1 and B.2. I'm having trouble with the following...
149
by: Christopher Benson-Manica | last post by:
(Followups set to comp.std.c. Apologies if the crosspost is unwelcome.) strchr() is to strrchr() as strstr() is to strrstr(), but strrstr() isn't part of the standard. Why not? --...
8
by: elyob | last post by:
Hi, I'm having problems outputting data from my MySQL database. The output should be Playa del inglés but instead I get ... Playa Del Ingl?s. I've tried utf8_encode(), but this just converts to...
1
by: Abe Simpson | last post by:
Hi all, My form file has this attribute: <META http-equiv="Content-Type" content="text/html; charset=utf-8"> The form's action points to an .aspx file where I try to retrieve field values...
4
by: EmeraldShield | last post by:
(Dot Net 2 C# application - using Encoding.UTF8 with a StreamReader) I have a very strange problem that I cannot explain with a UTF8 Readline() although this could exist in other types of encoding,...
5
by: ^AndreA^ | last post by:
Hello everybody, I'm trying to get a file on the internet and put it on my server... So, from javascript (AJAX) a call a php function that is simply: file_put_contents('RSS_news/bbc.xml',...
0
by: Tim Golden | last post by:
Lawrence, Anna K (US SSA) wrote: From where I'm sitting, I can't see enough to help. The crucial thing seems to be in this phrase: "When I try to take a string that is coming out of the database...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.