Connecting Tech Pros Worldwide Help | Site Map

[DOM XML] How to encode a node content (Accentuated characters)

Jean-Marc Molina
Guest
 
Posts: n/a
#1: Jul 17 '05
Hello,

I'm trying to generate a RSS newsfeed using the DOM XML functions. However I
can't find a way to use accentuated characters. I even tried to specify a
character encoding set but it doesn't solve the problem.

Error I get :
XML Parsing Error: not well-formed
Location: news.php?action=syndicate&format=rss2
Line Number 1, Column 102:<?xml version="1.0" encoding="ISO-8859-15"?><rss
version="2.0"><channel><title>Website title - News (
----------------------------------------------------------------------------
-----------------------------------------------------------^

The not well-formed character is an accentuated character, as you can see on
the following code sample :

<?php

header ('content-type: text/xml');

echo ('<?xml version="1.0" encoding="ISO-8859-15"?>');

$dom_doc = domxml_new_doc ('1.0');

$rss_el = $dom_doc->create_element ('rss');
$rss_el->set_attribute ('version', '2.0');
$rss_el = $dom_doc->append_child ($rss_el);

$channel_el = $dom_doc->create_element ('channel');
$channel_el = $rss_el->append_child ($channel_el);

$title_el = $dom_doc->create_element ('title');
$title_el = $channel_el->append_child ($title_el);
$title_el->set_content ('Website title - News (àéèù)');

echo ($dom_doc->html_dump_mem ());

?>

Removing the accentuated characters from the title generates a well formed
XML file. Note that I also tried to encode the characters using the
htmlentities function but it didn't change anything.

--
Jean-Marc.

Daniel Tryba
Guest
 
Posts: n/a
#2: Jul 17 '05

re: [DOM XML] How to encode a node content (Accentuated characters)


Jean-Marc Molina <jmmolina@pasdepourriel-free.fr> wrote:[color=blue]
> Error I get :
> XML Parsing Error: not well-formed
> Location: news.php?action=syndicate&format=rss2
> Line Number 1, Column 102:<?xml version="1.0" encoding="ISO-8859-15"?><rss
> version="2.0"><channel><title>Website title - News (
> ----------------------------------------------------------------------------
> -----------------------------------------------------------^
>
> The not well-formed character is an accentuated character, as you can see on
> the following code sample :[/color]
....[color=blue]
> $title_el->set_content ('Website title - News (????)');[/color]
....[color=blue]
> Removing the accentuated characters from the title generates a well formed
> XML file. Note that I also tried to encode the characters using the
> htmlentities function but it didn't change anything.[/color]

Disclaimer: I haven't ever used xmldomdoc in php.

If you are trying to create wellformed XML you should use UTF8 as the
encoding, since that is about the only encoding that XML utils must
support. That is also your problem, your xml doc. is in UTF8 since you
haven't told anyone otherwise (and I can't find how to do that in the
domxml reference).

htmlenties doesn't work either because there are only 5 xmlentities by
default: &amp; &apos; &quot; &gt; &lt;

IMHO the best solution would be to translate you string to UTF8:
http://nl.php.net/manual/en/function...t-encoding.php

(be sure to tell it's iso-8859-15 or else the EUR symbol will get
dropped for the generic currency symbol).

Or use http://nl.php.net/manual/en/function.utf8-encode.php after
manually encoding EUR.

--

Daniel Tryba

Jean-Marc Molina
Guest
 
Posts: n/a
#3: Jul 17 '05

re: [DOM XML] How to encode a node content (Accentuated characters)


Daniel Tryba a écrit/wrote :[color=blue]
> htmlenties doesn't work either because there are only 5 xmlentities by
> default: &amp; &apos; &quot; &gt; &lt;[/color]

I didn't know that. I solved my problem by calling htmlentities 2 times, to
remove the junk :).
[color=blue]
> IMHO the best solution would be to translate you string to UTF8:
> http://nl.php.net/manual/en/function...t-encoding.php[/color]

That's an other solution, a better one, thanks !

<?php

....
$title = mb_convert_encoding (stripslashes ($newsitem ['title']), 'UTF-8',
'ISO-8859-15');
....

?>

From ISO-8859-15 to UTF-8.

--
Jean-Marc.

Jean-Marc Molina
Guest
 
Posts: n/a
#4: Jul 17 '05

re: [DOM XML] How to encode a node content (Accentuated characters)


Daniel Tryba a écrit/wrote :[color=blue]
> If you are trying to create wellformed XML you should use UTF8 as the
> encoding, since that is about the only encoding that XML utils must
> support. That is also your problem, your xml doc. is in UTF8 since you
> haven't told anyone otherwise (and I can't find how to do that in the
> domxml reference).[/color]

Do you know of any good UTF-8 coding editor ? I develop using HTML-Kit and
jEdit but they don't support UTF-8 editing. However I'm not sure application
servers can handle UTF-8 scripts. Can they ?

Thanks again for all your help.

--
Jean-Marc.

Daniel Tryba
Guest
 
Posts: n/a
#5: Jul 17 '05

re: [DOM XML] How to encode a node content (Accentuated characters)


Jean-Marc Molina <jmmolina@pasdepourriel-free.fr> wrote:[color=blue]
> Do you know of any good UTF-8 coding editor ?[/color]

I never ever use anything other than ASCII :)
[color=blue]
> I develop using HTML-Kit and
> jEdit but they don't support UTF-8 editing. However I'm not sure application
> servers can handle UTF-8 scripts. Can they ?[/color]

Don't know about htmlkit. jedit is Java, which uses UCS-2 internally,
which is what UTF-8 encapsulates... loading and saving utf8 data should
not be any problem IMHO.

--

Daniel Tryba

Daniel Tryba
Guest
 
Posts: n/a
#6: Jul 17 '05

re: [DOM XML] How to encode a node content (Accentuated characters)


Jean-Marc Molina <jmmolina@pasdepourriel-free.fr> wrote:[color=blue]
> However I'm not sure application servers can handle UTF-8 scripts. Can
> they ?[/color]

Forgot to answer this one.

http://nl.php.net/mb-string should make it possible to use various
encodings. mbstring.internal_encoding and mbstring.script_encoding
should be the keysettings...

--

Daniel Tryba

Simon Stienen
Guest
 
Posts: n/a
#7: Jul 17 '05

re: [DOM XML] How to encode a node content (Accentuated characters)


Daniel Tryba <Daniel Tryba <news_comp.lang.php@canopus.nl>> wrote:[color=blue]
> htmlenties doesn't work either because there are only 5 xmlentities by
> default: &amp; &apos; &quot; &gt; &lt;[/color]
However, you can still use numerical entities: &#...;

--
Simon Stienen <http://dangerouscat.net> <http://slashlife.de>
»What you do in this world is a matter of no consequence,
The question is, what can you make people believe that you have done.«
-- Sherlock Holmes in "A Study in Scarlet" by Sir Arthur Conan Doyle
Closed Thread