473,699 Members | 2,496 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

UTF8: file_put_conten ts doesn't seem to write UTF8 content properly

Hi,

I'm trying to let PHP write a 'sitemap.xml' sitemap for Google and other
searchengines. It's working, except that the content in the XML file doesn't
seem to be UTF8. (Which it should be, judging by the information given on
Google's webmaster helpcenter).

The way I test to see if the content is UTF8, is by opening the XML file in
notepad and choose 'save as...'. Normally the coding option should be set to
UTF8, but now it just shows ANSI.

This is what I have tried to write UTF8 content with:

file_put_conten ts( '.' . SITEMAP_FILE, utf8_encode(
$this->sitemapForCraw lers ) );
....and...
file_put_conten ts( '.' . SITEMAP_FILE, iconv( "ISO-8859-1", "UTF8",
$this->sitemapForCraw lers ) );

....where...
SITEMAP_FILE is the filename constant
....and...
$this->sitemapForCraw lers is the string with XML data

With the last attempt I even got an error saying:

Wrong charset, conversion from `ISO-8859-1' to `UTF8' is not allowed in...
Any adeas of how I can make this work?

Thanks for the input.
Jun 13 '07 #1
7 13089
On Wed, 13 Jun 2007 22:25:44 +0200, "amygdala" <no*****@norepl y.comwrote:
>I'm trying to let PHP write a 'sitemap.xml' sitemap for Google and other
searchengine s. It's working, except that the content in the XML file doesn't
seem to be UTF8. (Which it should be, judging by the information given on
Google's webmaster helpcenter).

The way I test to see if the content is UTF8, is by opening the XML file in
notepad and choose 'save as...'. Normally the coding option should be set to
UTF8, but now it just shows ANSI.
Well, that's not a foolproof method...
>This is what I have tried to write UTF8 content with:

file_put_conte nts( '.' . SITEMAP_FILE, utf8_encode(
$this->sitemapForCraw lers ) );
...and...
file_put_conte nts( '.' . SITEMAP_FILE, iconv( "ISO-8859-1", "UTF8",
$this->sitemapForCraw lers ) );

...where...
SITEMAP_FILE is the filename constant
...and...
$this->sitemapForCraw lers is the string with XML data

With the last attempt I even got an error saying:

Wrong charset, conversion from `ISO-8859-1' to `UTF8' is not allowed in...

Any adeas of how I can make this work?
Start from the beginning; what character set encoding is the original data in?
The error implies that it's not ISO-8859-1 (which does have some gaps where
characters aren't valid...)

--
Andy Hassall :: an**@andyh.co.u k :: http://www.andyh.co.uk
http://www.andyhsoftware.co.uk/space :: disk and FTP usage analysis tool
Jun 13 '07 #2
C.
On 13 Jun, 21:25, "amygdala" <nore...@norepl y.comwrote:
>
The way I test to see if the content is UTF8, is by opening the XML file in
notepad and choose 'save as...'. Normally the coding option should be set to
UTF8, but now it just shows ANSI.
ROFL.

Try inserting a BOM in front of the content.

C.

Jun 13 '07 #3

"Andy Hassall" <an**@andyh.co. ukschreef in bericht
news:nb******** *************** *********@4ax.c om...
On Wed, 13 Jun 2007 22:25:44 +0200, "amygdala" <no*****@norepl y.com>
wrote:
>>I'm trying to let PHP write a 'sitemap.xml' sitemap for Google and other
searchengines . It's working, except that the content in the XML file
doesn't
seem to be UTF8. (Which it should be, judging by the information given on
Google's webmaster helpcenter).

The way I test to see if the content is UTF8, is by opening the XML file
in
notepad and choose 'save as...'. Normally the coding option should be set
to
UTF8, but now it just shows ANSI.

Well, that's not a foolproof method...

I was afraid of that.

>>This is what I have tried to write UTF8 content with:

file_put_cont ents( '.' . SITEMAP_FILE, utf8_encode(
$this->sitemapForCraw lers ) );
...and...
file_put_cont ents( '.' . SITEMAP_FILE, iconv( "ISO-8859-1", "UTF8",
$this->sitemapForCraw lers ) );

...where...
SITEMAP_FIL E is the filename constant
...and...
$this->sitemapForCraw lers is the string with XML data

With the last attempt I even got an error saying:

Wrong charset, conversion from `ISO-8859-1' to `UTF8' is not allowed in...

Any adeas of how I can make this work?

Start from the beginning; what character set encoding is the original data
in?
The error implies that it's not ISO-8859-1 (which does have some gaps
where
characters aren't valid...)
Well... I discovered the 'Set Code Page...' option in UltraEdit, the main
editor I use to code PHP. And it tells me my PHP code files are encoded in
'1252 (ANSI - Latin I)'. So, now my next question is... what would be the
correct first parameter for the iconv function to tell it that the original
data is '1252 (ANSI - Latin I)'. I've tried numerous stings, which include:

'1252 (ANSI - Latin I)'
'1252'
'1252 ANSI'
'1252-ANSI'
'ANSI-1252'
'ANSI 1252'

....and variations.

Is there any iconv encoding table with acceptable encodings I can consult?
Also, isn't '1252 (ANSI - Latin I)' just a pimped version of ISO-8859-1?

Although I'm still curious of this. Please read my reply to C. also.

Thanks.
Jun 14 '07 #4

"C." <co************ @gmail.comschre ef in bericht
news:11******** **************@ e9g2000prf.goog legroups.com...
On 13 Jun, 21:25, "amygdala" <nore...@norepl y.comwrote:
>>
The way I test to see if the content is UTF8, is by opening the XML file
in
notepad and choose 'save as...'. Normally the coding option should be set
to
UTF8, but now it just shows ANSI.

ROFL.
Yes, very amusing. :-/
Try inserting a BOM in front of the content.
Ok, I did a little research on BOM. And came up with information that tells
me the BOM isn't particularly necessary for UTF-8. Then I ran a simple test
with utf8_encode:

<?php
echo utf8_encode( 'ï' );
?>

Which output looks like it works just fine:

ï

Since I've concluded (see my reply to Andy Hassall) that my files are
encoded in '1252 (ANSI - Latijn I)' and this test file was also '1252
(ANSI - Latijn I)' I guess it works. And I don't necessary have to provide a
BOM and don't have to resort to iconv. Correct?

Or am I missing something vital here?

Thanks.
Jun 14 '07 #5
On Thu, 14 Jun 2007 03:39:34 +0200, "amygdala" <no*****@norepl y.comwrote:
>Start from the beginning; what character set encoding is the original data
in?
The error implies that it's not ISO-8859-1 (which does have some gaps
where
characters aren't valid...)

Well... I discovered the 'Set Code Page...' option in UltraEdit, the main
editor I use to code PHP. And it tells me my PHP code files are encoded in
'1252 (ANSI - Latin I)'.
Well... again, that's not foolproof. It's generally not possible to
definitively detect the encoding of a file. You can work out whether it's
impossible to be in a particular encoding (invalid characters or byte
sequences), and you can make some guesses on character distribution or
spellings of words, but unless it's tagged in some way (like HTML and XML, or
through another channel like HTTP headers) then it's not certain.

"Windows Codepage 1252" is a Windows character set encoding that is similar,
but not exactly the same as ISO-8859-1. It (1252) differs on the location of
the Euro character, and has a few extra characters in a range that is reserved
in ISO-8859-1.

Do you have any Euro currency symbols in the file?
>So, now my next question is... what would be the
correct first parameter for the iconv function to tell it that the original
data is '1252 (ANSI - Latin I)'. I've tried numerous stings, which include:

'1252 (ANSI - Latin I)'
'1252'
'1252 ANSI'
'1252-ANSI'
'ANSI-1252'
'ANSI 1252'

...and variations.

Is there any iconv encoding table with acceptable encodings I can consult?
http://www.gnu.org/software/libiconv/

You possibly want:

CP1252
>Also, isn't '1252 (ANSI - Latin I)' just a pimped version of ISO-8859-1?
I should read the entire message before typing ;-)

--
Andy Hassall :: an**@andyh.co.u k :: http://www.andyh.co.uk
http://www.andyhsoftware.co.uk/space :: disk and FTP usage analysis tool
Jun 14 '07 #6
I'm trying to let PHP write a 'sitemap.xml' sitemap for Google and other
searchengines. It's working, except that the content in the XML file doesn't
seem to be UTF8. (Which it should be, judging by the information given on
Google's webmaster helpcenter).
How can you tell? YOU tell the system what encoding is used. The system
rarely tells you, as bytes can be perfectly valid text in a lot of
encodings and look very different in each of them.

Even if the system tells you, it usually does so separately from the
text itself. Which is obvious, because you need the encoding to be able
to read the text! In webpages and e-mail, for example, headers are used
to set the encoding of the data.

I suggest you search the net for encodings and how to work with them.
This is a good start:

http://www.joelonsoftware.com/articles/Unicode.html

Good luck with the onions,
--
Willem Bogaerts

Application smith
Kratz B.V.
http://www.kratz.nl/
Jun 15 '07 #7

"Willem Bogaerts" <w.********@kra tz.maardanzonde rditstuk.nlschr eef in
bericht news:46******** *************@n ews.xs4all.nl.. .
>I'm trying to let PHP write a 'sitemap.xml' sitemap for Google and other
searchengine s. It's working, except that the content in the XML file
doesn't
seem to be UTF8. (Which it should be, judging by the information given on
Google's webmaster helpcenter).

How can you tell? YOU tell the system what encoding is used. The system
rarely tells you, as bytes can be perfectly valid text in a lot of
encodings and look very different in each of them.
Yes, you're right of course, what was I thinking.
Even if the system tells you, it usually does so separately from the
text itself. Which is obvious, because you need the encoding to be able
to read the text! In webpages and e-mail, for example, headers are used
to set the encoding of the data.

I suggest you search the net for encodings and how to work with them.
This is a good start:

http://www.joelonsoftware.com/articles/Unicode.html
Great article. Thanks for the pointer.
Good luck with the onions,
Hopefully that won't be necessary anymore.
Cheers.
--
Willem Bogaerts

Application smith
Kratz B.V.
http://www.kratz.nl/

Jun 16 '07 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
2432
by: Michael Preminger | last post by:
Hello! We are running php 4.3.10 Therefore I need to use php_put_contents through pear. Thats what my file looks like: <?php require_once 'php/Compat/Function/file_put_contents.php';
5
6908
by: Richard Lewis | last post by:
Hi there, I'm having a problem with unicode files and ftplib (using Python 2.3.5). I've got this code: xml_source = codecs.open("foo.xml", 'w+b', "utf8") #xml_source = file("foo.xml", 'w+b') ftp.retrbinary("RETR foo.xml", xml_source.write)
12
4098
by: Chris Mullins | last post by:
I'm implementing RFC 3491 in .NET, and running into a strange issue. Step 1 of RFC 3491 is performing a set of mappings dicated by tables B.1 and B.2. I'm having trouble with the following mappings though, and it seems like a shortcoming of the .NET framework: When I see Unicode value 0x10400, I'm supposed to map it to value 0x10428. This list goes on (the left colulmn is the existing value, the right column
149
25148
by: Christopher Benson-Manica | last post by:
(Followups set to comp.std.c. Apologies if the crosspost is unwelcome.) strchr() is to strrchr() as strstr() is to strrstr(), but strrstr() isn't part of the standard. Why not? -- Christopher Benson-Manica | I *should* know what I'm talking about - if I ataru(at)cyberspace.org | don't, I need to know. Flames welcome.
8
2115
by: elyob | last post by:
Hi, I'm having problems outputting data from my MySQL database. The output should be Playa del inglés but instead I get ... Playa Del Ingl?s. I've tried utf8_encode(), but this just converts to Playa Del Inglã©s. Any advice? It seems to show fine in PHPMyadmin. Thanks
1
2178
by: Abe Simpson | last post by:
Hi all, My form file has this attribute: <META http-equiv="Content-Type" content="text/html; charset=utf-8"> The form's action points to an .aspx file where I try to retrieve field values as follows: txtTextBox.Text = Request;
4
4886
by: EmeraldShield | last post by:
(Dot Net 2 C# application - using Encoding.UTF8 with a StreamReader) I have a very strange problem that I cannot explain with a UTF8 Readline() although this could exist in other types of encoding, I have not tried them. Our application wrote this sequence to a UTF8 file. Now I am loading it back and the text is not coming back in the same as it went out. DATA: from: processfrom checkemail failed: 501 syntax error in parameters:...
5
10979
by: ^AndreA^ | last post by:
Hello everybody, I'm trying to get a file on the internet and put it on my server... So, from javascript (AJAX) a call a php function that is simply: file_put_contents('RSS_news/bbc.xml', file_get_contents('http:// newsrss.bbc.co.uk/rss/newsonline_uk_edition/front_page/rss.xml')); file_get_contents doesn't seem to give any problem...
0
1008
by: Tim Golden | last post by:
Lawrence, Anna K (US SSA) wrote: From where I'm sitting, I can't see enough to help. The crucial thing seems to be in this phrase: "When I try to take a string that is coming out of the database and export it to a Word document, no amount of encoding/decoding will make the Word doc display properly". It presumably doesn't matter what web framework etc. you're using: you either get a unicode object or an encoded string from the database...
0
9196
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9054
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8941
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
7784
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
4390
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4637
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3071
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2362
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2015
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.