By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,719 Members | 1,220 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,719 IT Pros & Developers. It's quick & easy.

[DOM XML] How to encode a node content (Accentuated characters)

P: n/a
Hello,

I'm trying to generate a RSS newsfeed using the DOM XML functions. However I
can't find a way to use accentuated characters. I even tried to specify a
character encoding set but it doesn't solve the problem.

Error I get :
XML Parsing Error: not well-formed
Location: news.php?action=syndicate&format=rss2
Line Number 1, Column 102:<?xml version="1.0" encoding="ISO-8859-15"?><rss
version="2.0"><channel><title>Website title - News (
----------------------------------------------------------------------------
-----------------------------------------------------------^

The not well-formed character is an accentuated character, as you can see on
the following code sample :

<?php

header ('content-type: text/xml');

echo ('<?xml version="1.0" encoding="ISO-8859-15"?>');

$dom_doc = domxml_new_doc ('1.0');

$rss_el = $dom_doc->create_element ('rss');
$rss_el->set_attribute ('version', '2.0');
$rss_el = $dom_doc->append_child ($rss_el);

$channel_el = $dom_doc->create_element ('channel');
$channel_el = $rss_el->append_child ($channel_el);

$title_el = $dom_doc->create_element ('title');
$title_el = $channel_el->append_child ($title_el);
$title_el->set_content ('Website title - News (àéèù)');

echo ($dom_doc->html_dump_mem ());

?>

Removing the accentuated characters from the title generates a well formed
XML file. Note that I also tried to encode the characters using the
htmlentities function but it didn't change anything.

--
Jean-Marc.

Jul 17 '05 #1
Share this Question
Share on Google+
6 Replies


P: n/a
Jean-Marc Molina <jm******@pasdepourriel-free.fr> wrote:
Error I get :
XML Parsing Error: not well-formed
Location: news.php?action=syndicate&format=rss2
Line Number 1, Column 102:<?xml version="1.0" encoding="ISO-8859-15"?><rss
version="2.0"><channel><title>Website title - News (
----------------------------------------------------------------------------
-----------------------------------------------------------^

The not well-formed character is an accentuated character, as you can see on
the following code sample : .... $title_el->set_content ('Website title - News (????)'); .... Removing the accentuated characters from the title generates a well formed
XML file. Note that I also tried to encode the characters using the
htmlentities function but it didn't change anything.


Disclaimer: I haven't ever used xmldomdoc in php.

If you are trying to create wellformed XML you should use UTF8 as the
encoding, since that is about the only encoding that XML utils must
support. That is also your problem, your xml doc. is in UTF8 since you
haven't told anyone otherwise (and I can't find how to do that in the
domxml reference).

htmlenties doesn't work either because there are only 5 xmlentities by
default: &amp; &apos; &quot; &gt; &lt;

IMHO the best solution would be to translate you string to UTF8:
http://nl.php.net/manual/en/function...t-encoding.php

(be sure to tell it's iso-8859-15 or else the EUR symbol will get
dropped for the generic currency symbol).

Or use http://nl.php.net/manual/en/function.utf8-encode.php after
manually encoding EUR.

--

Daniel Tryba

Jul 17 '05 #2

P: n/a
Daniel Tryba a écrit/wrote :
htmlenties doesn't work either because there are only 5 xmlentities by
default: &amp; &apos; &quot; &gt; &lt;
I didn't know that. I solved my problem by calling htmlentities 2 times, to
remove the junk :).
IMHO the best solution would be to translate you string to UTF8:
http://nl.php.net/manual/en/function...t-encoding.php


That's an other solution, a better one, thanks !

<?php

....
$title = mb_convert_encoding (stripslashes ($newsitem ['title']), 'UTF-8',
'ISO-8859-15');
....

?>

From ISO-8859-15 to UTF-8.

--
Jean-Marc.

Jul 17 '05 #3

P: n/a
Daniel Tryba a écrit/wrote :
If you are trying to create wellformed XML you should use UTF8 as the
encoding, since that is about the only encoding that XML utils must
support. That is also your problem, your xml doc. is in UTF8 since you
haven't told anyone otherwise (and I can't find how to do that in the
domxml reference).


Do you know of any good UTF-8 coding editor ? I develop using HTML-Kit and
jEdit but they don't support UTF-8 editing. However I'm not sure application
servers can handle UTF-8 scripts. Can they ?

Thanks again for all your help.

--
Jean-Marc.

Jul 17 '05 #4

P: n/a
Jean-Marc Molina <jm******@pasdepourriel-free.fr> wrote:
Do you know of any good UTF-8 coding editor ?
I never ever use anything other than ASCII :)
I develop using HTML-Kit and
jEdit but they don't support UTF-8 editing. However I'm not sure application
servers can handle UTF-8 scripts. Can they ?


Don't know about htmlkit. jedit is Java, which uses UCS-2 internally,
which is what UTF-8 encapsulates... loading and saving utf8 data should
not be any problem IMHO.

--

Daniel Tryba

Jul 17 '05 #5

P: n/a
Jean-Marc Molina <jm******@pasdepourriel-free.fr> wrote:
However I'm not sure application servers can handle UTF-8 scripts. Can
they ?


Forgot to answer this one.

http://nl.php.net/mb-string should make it possible to use various
encodings. mbstring.internal_encoding and mbstring.script_encoding
should be the keysettings...

--

Daniel Tryba

Jul 17 '05 #6

P: n/a
Daniel Tryba <Daniel Tryba <ne****************@canopus.nl>> wrote:
htmlenties doesn't work either because there are only 5 xmlentities by
default: &amp; &apos; &quot; &gt; &lt;

However, you can still use numerical entities: &#...;

--
Simon Stienen <http://dangerouscat.net> <http://slashlife.de>
»What you do in this world is a matter of no consequence,
The question is, what can you make people believe that you have done.«
-- Sherlock Holmes in "A Study in Scarlet" by Sir Arthur Conan Doyle
Jul 17 '05 #7

This discussion thread is closed

Replies have been disabled for this discussion.