Hi,
I'm trying to let PHP write a 'sitemap.xml' sitemap for Google and other
searchengines. It's working, except that the content in the XML file doesn't
seem to be UTF8. (Which it should be, judging by the information given on
Google's webmaster helpcenter).
The way I test to see if the content is UTF8, is by opening the XML file in
notepad and choose 'save as...'. Normally the coding option should be set to
UTF8, but now it just shows ANSI.
This is what I have tried to write UTF8 content with:
file_put_contents( '.' . SITEMAP_FILE, utf8_encode(
$this->sitemapForCrawlers ) );
....and...
file_put_contents( '.' . SITEMAP_FILE, iconv( "ISO-8859-1", "UTF8",
$this->sitemapForCrawlers ) );
....where...
SITEMAP_FILE is the filename constant
....and...
$this->sitemapForCrawlers is the string with XML data
With the last attempt I even got an error saying:
Wrong charset, conversion from `ISO-8859-1' to `UTF8' is not allowed in...
Any adeas of how I can make this work?
Thanks for the input. 7 12978
On Wed, 13 Jun 2007 22:25:44 +0200, "amygdala" <no*****@noreply.comwrote:
>I'm trying to let PHP write a 'sitemap.xml' sitemap for Google and other searchengines. It's working, except that the content in the XML file doesn't seem to be UTF8. (Which it should be, judging by the information given on Google's webmaster helpcenter).
The way I test to see if the content is UTF8, is by opening the XML file in notepad and choose 'save as...'. Normally the coding option should be set to UTF8, but now it just shows ANSI.
Well, that's not a foolproof method...
>This is what I have tried to write UTF8 content with:
file_put_contents( '.' . SITEMAP_FILE, utf8_encode( $this->sitemapForCrawlers ) ); ...and... file_put_contents( '.' . SITEMAP_FILE, iconv( "ISO-8859-1", "UTF8", $this->sitemapForCrawlers ) );
...where... SITEMAP_FILE is the filename constant ...and... $this->sitemapForCrawlers is the string with XML data
With the last attempt I even got an error saying:
Wrong charset, conversion from `ISO-8859-1' to `UTF8' is not allowed in...
Any adeas of how I can make this work?
Start from the beginning; what character set encoding is the original data in?
The error implies that it's not ISO-8859-1 (which does have some gaps where
characters aren't valid...)
--
Andy Hassall :: an**@andyh.co.uk :: http://www.andyh.co.uk http://www.andyhsoftware.co.uk/space :: disk and FTP usage analysis tool
On 13 Jun, 21:25, "amygdala" <nore...@noreply.comwrote:
>
The way I test to see if the content is UTF8, is by opening the XML file in
notepad and choose 'save as...'. Normally the coding option should be set to
UTF8, but now it just shows ANSI.
ROFL.
Try inserting a BOM in front of the content.
C.
"Andy Hassall" <an**@andyh.co.ukschreef in bericht
news:nb********************************@4ax.com...
On Wed, 13 Jun 2007 22:25:44 +0200, "amygdala" <no*****@noreply.com>
wrote:
>>I'm trying to let PHP write a 'sitemap.xml' sitemap for Google and other searchengines. It's working, except that the content in the XML file doesn't seem to be UTF8. (Which it should be, judging by the information given on Google's webmaster helpcenter).
The way I test to see if the content is UTF8, is by opening the XML file in notepad and choose 'save as...'. Normally the coding option should be set to UTF8, but now it just shows ANSI.
Well, that's not a foolproof method...
I was afraid of that.
>>This is what I have tried to write UTF8 content with:
file_put_contents( '.' . SITEMAP_FILE, utf8_encode( $this->sitemapForCrawlers ) ); ...and... file_put_contents( '.' . SITEMAP_FILE, iconv( "ISO-8859-1", "UTF8", $this->sitemapForCrawlers ) );
...where... SITEMAP_FILE is the filename constant ...and... $this->sitemapForCrawlers is the string with XML data
With the last attempt I even got an error saying:
Wrong charset, conversion from `ISO-8859-1' to `UTF8' is not allowed in...
Any adeas of how I can make this work?
Start from the beginning; what character set encoding is the original data
in?
The error implies that it's not ISO-8859-1 (which does have some gaps
where
characters aren't valid...)
Well... I discovered the 'Set Code Page...' option in UltraEdit, the main
editor I use to code PHP. And it tells me my PHP code files are encoded in
'1252 (ANSI - Latin I)'. So, now my next question is... what would be the
correct first parameter for the iconv function to tell it that the original
data is '1252 (ANSI - Latin I)'. I've tried numerous stings, which include:
'1252 (ANSI - Latin I)'
'1252'
'1252 ANSI'
'1252-ANSI'
'ANSI-1252'
'ANSI 1252'
....and variations.
Is there any iconv encoding table with acceptable encodings I can consult?
Also, isn't '1252 (ANSI - Latin I)' just a pimped version of ISO-8859-1?
Although I'm still curious of this. Please read my reply to C. also.
Thanks.
"C." <co************@gmail.comschreef in bericht
news:11**********************@e9g2000prf.googlegro ups.com...
On 13 Jun, 21:25, "amygdala" <nore...@noreply.comwrote:
>> The way I test to see if the content is UTF8, is by opening the XML file in notepad and choose 'save as...'. Normally the coding option should be set to UTF8, but now it just shows ANSI.
ROFL.
Yes, very amusing. :-/
Try inserting a BOM in front of the content.
Ok, I did a little research on BOM. And came up with information that tells
me the BOM isn't particularly necessary for UTF-8. Then I ran a simple test
with utf8_encode:
<?php
echo utf8_encode( 'ï' );
?>
Which output looks like it works just fine:
ï
Since I've concluded (see my reply to Andy Hassall) that my files are
encoded in '1252 (ANSI - Latijn I)' and this test file was also '1252
(ANSI - Latijn I)' I guess it works. And I don't necessary have to provide a
BOM and don't have to resort to iconv. Correct?
Or am I missing something vital here?
Thanks.
On Thu, 14 Jun 2007 03:39:34 +0200, "amygdala" <no*****@noreply.comwrote:
>Start from the beginning; what character set encoding is the original data in? The error implies that it's not ISO-8859-1 (which does have some gaps where characters aren't valid...)
Well... I discovered the 'Set Code Page...' option in UltraEdit, the main editor I use to code PHP. And it tells me my PHP code files are encoded in '1252 (ANSI - Latin I)'.
Well... again, that's not foolproof. It's generally not possible to
definitively detect the encoding of a file. You can work out whether it's
impossible to be in a particular encoding (invalid characters or byte
sequences), and you can make some guesses on character distribution or
spellings of words, but unless it's tagged in some way (like HTML and XML, or
through another channel like HTTP headers) then it's not certain.
"Windows Codepage 1252" is a Windows character set encoding that is similar,
but not exactly the same as ISO-8859-1. It (1252) differs on the location of
the Euro character, and has a few extra characters in a range that is reserved
in ISO-8859-1.
Do you have any Euro currency symbols in the file?
>So, now my next question is... what would be the correct first parameter for the iconv function to tell it that the original data is '1252 (ANSI - Latin I)'. I've tried numerous stings, which include:
'1252 (ANSI - Latin I)' '1252' '1252 ANSI' '1252-ANSI' 'ANSI-1252' 'ANSI 1252'
...and variations.
Is there any iconv encoding table with acceptable encodings I can consult?
http://www.gnu.org/software/libiconv/
You possibly want:
CP1252
>Also, isn't '1252 (ANSI - Latin I)' just a pimped version of ISO-8859-1?
I should read the entire message before typing ;-)
--
Andy Hassall :: an**@andyh.co.uk :: http://www.andyh.co.uk http://www.andyhsoftware.co.uk/space :: disk and FTP usage analysis tool
I'm trying to let PHP write a 'sitemap.xml' sitemap for Google and other
searchengines. It's working, except that the content in the XML file doesn't
seem to be UTF8. (Which it should be, judging by the information given on
Google's webmaster helpcenter).
How can you tell? YOU tell the system what encoding is used. The system
rarely tells you, as bytes can be perfectly valid text in a lot of
encodings and look very different in each of them.
Even if the system tells you, it usually does so separately from the
text itself. Which is obvious, because you need the encoding to be able
to read the text! In webpages and e-mail, for example, headers are used
to set the encoding of the data.
I suggest you search the net for encodings and how to work with them.
This is a good start: http://www.joelonsoftware.com/articles/Unicode.html
Good luck with the onions,
--
Willem Bogaerts
Application smith
Kratz B.V. http://www.kratz.nl/
"Willem Bogaerts" <w.********@kratz.maardanzonderditstuk.nlschreef in
bericht news:46*********************@news.xs4all.nl...
>I'm trying to let PHP write a 'sitemap.xml' sitemap for Google and other searchengines. It's working, except that the content in the XML file doesn't seem to be UTF8. (Which it should be, judging by the information given on Google's webmaster helpcenter).
How can you tell? YOU tell the system what encoding is used. The system
rarely tells you, as bytes can be perfectly valid text in a lot of
encodings and look very different in each of them.
Yes, you're right of course, what was I thinking.
Even if the system tells you, it usually does so separately from the
text itself. Which is obvious, because you need the encoding to be able
to read the text! In webpages and e-mail, for example, headers are used
to set the encoding of the data.
I suggest you search the net for encodings and how to work with them.
This is a good start:
http://www.joelonsoftware.com/articles/Unicode.html
Great article. Thanks for the pointer.
Good luck with the onions,
Hopefully that won't be necessary anymore.
Cheers.
--
Willem Bogaerts
Application smith
Kratz B.V. http://www.kratz.nl/ This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Michael Preminger |
last post by:
Hello!
We are running php 4.3.10
Therefore I need to use php_put_contents through
pear.
Thats what my file looks like:
<?php
require_once 'php/Compat/Function/file_put_contents.php';
|
by: Richard Lewis |
last post by:
Hi there,
I'm having a problem with unicode files and ftplib (using Python 2.3.5).
I've got this code:
xml_source = codecs.open("foo.xml", 'w+b', "utf8")
#xml_source = file("foo.xml",...
|
by: Chris Mullins |
last post by:
I'm implementing RFC 3491 in .NET, and running into a strange issue.
Step 1 of RFC 3491 is performing a set of mappings dicated by tables B.1 and
B.2.
I'm having trouble with the following...
|
by: Christopher Benson-Manica |
last post by:
(Followups set to comp.std.c. Apologies if the crosspost is unwelcome.)
strchr() is to strrchr() as strstr() is to strrstr(), but strrstr()
isn't part of the standard. Why not?
--...
|
by: elyob |
last post by:
Hi,
I'm having problems outputting data from my MySQL database. The output
should be Playa del inglés but instead I get ... Playa Del Ingl?s. I've
tried utf8_encode(), but this just converts to...
|
by: Abe Simpson |
last post by:
Hi all,
My form file has this attribute:
<META http-equiv="Content-Type" content="text/html; charset=utf-8">
The form's action points to an .aspx file where I try to retrieve field
values...
|
by: EmeraldShield |
last post by:
(Dot Net 2 C# application - using Encoding.UTF8 with a StreamReader)
I have a very strange problem that I cannot explain with a UTF8 Readline()
although this could exist in other types of encoding,...
|
by: ^AndreA^ |
last post by:
Hello everybody,
I'm trying to get a file on the internet and put it on my server...
So, from javascript (AJAX) a call a php function that is simply:
file_put_contents('RSS_news/bbc.xml',...
|
by: Tim Golden |
last post by:
Lawrence, Anna K (US SSA) wrote:
From where I'm sitting, I can't see enough to help. The crucial
thing seems to be in this phrase: "When I try to take a string that
is coming out of the database...
|
by: CloudSolutions |
last post by:
Introduction:
For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
|
by: Faith0G |
last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome former...
|
by: Charles Arthur |
last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
|
by: aa123db |
last post by:
Variable and constants
Use var or let for variables and const fror constants.
Var foo ='bar';
Let foo ='bar';const baz ='bar';
Functions
function $name$ ($parameters$) {
}
...
|
by: ryjfgjl |
last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
|
by: ryjfgjl |
last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: Sonnysonu |
last post by:
This is the data of csv file
1 2 3
1 2 3
1 2 3
1 2 3
2 3
2 3
3
the lengths should be different i have to store the data by column-wise with in the specific length.
suppose the i have to...
| |