473,382 Members | 1,809 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,382 software developers and data experts.

XML CDATA special characters

Hi.

I'm trying to develop a program that uses XML files store data. I'm using
Windows XP, Apache 1.3.29 and PHP 4.3.4.

Right now the XML file is read using the xml_parser_create(),
xml_set_element_handler() etc. functions. I have difficulties with special
characters in the data.

I found information on "<![CDATA[ special chars here ]]>", UTF-8, XML DOM,
htmlentities(), and more, but I'm confused with all these terms and their
meaning.

I think I should use CDATA sections anyhow, right? Or is this UTF-8 a way to
use special characters without bothering the XML parser?

Long ago I used a DOM in Perl and liked it, is it hard to use the PHP XML
DOM and is it (part of a) solution to my problem?

Right now (with the xml_parser_ functions) my program outputs something like
<img alt="Data from XML file, sometimes with "quotes".">
to the browser, which isn't right because of the early end-quote. Where and
how should I avoid this? This is where htmlentities fits in, right? And I
once read something about PHP settings dealing with HTML characters.

It's not that I'm lazy, but there's a lot of information on a lot of
interrelated subjects. Who can help me out here please?

Regards,

- John van Terheijden, the Netherlands
Jul 17 '05 #1
6 17689
for a start, if you are "creating" XML content, then you need to use the
DOM API and not the SAX API. As far as I am aware, the SAX API is just
for "reading" XML data and not writing to it. Someone please correct me
if I am wrong.

The DOM API will conveniently do all special character escaping for you
so you dont have to worry about using functions *like* htmlentities().
On that point, basic XML only has 5 pre-defined default entities. And
off the top of my head, I think they are:
-- &gt;

< -- &lt;
" -- &quot;
& -- &amp;
[insert fifth one here]

The other one escapes me (no pun intended). If you try and use HTML
entities, then you will likely create invalid XML documents because HTML
has entities that are "undefined" in the default XML set.

When you use an XML parser (be it SAX, or DOM) to get the data back from
the XML storage files, everything (including entities) will be converted
back (un-escaped). So you really do not need to use CDATA sections.
CDATA sections do have their usages but their absolute neccecity is
limited to a very few cases.

SPECIAL NOTE ON XSL STYLESHEETS:
If you are using XSL templates to extract HTML markup contained
(escaped) in the XML storage files, use the disable-output-escaping
attribute of the value-of directive to disable output escaping. This is
useful if you have done something like this...
$element->set_content($htmlSource);
and you wish the output tree to contain unescaped HTML.

As for character encoding (UTF8 etc), it depends on what sort of data
you are putting in there. Odds are you needn't concern yourself with
this unless you know that your source data is UTF-16 or something. Just
try using the DOM XML functions and see how you go.

Jul 17 '05 #2
Terence wrote:
On that point, basic XML only has 5 pre-defined default entities. And
off the top of my head, I think they are:
> -- &gt; < -- &lt;
" -- &quot;
& -- &amp;
[insert fifth one here]

The other one escapes me (no pun intended).


The other one was introduced by XML1.0, and doesn't exist in any HTML
version. It's U+0027 APOSTROPHE ("'"), with an entity reference of
&apos;, a decimal character reference of ', and a hexadecimal
character reference of &#x27; (XML1.0 sec. 4.6).
If you try and use HTML entities, then you will likely create invalid
XML documents because HTML has entities that are "undefined" in the
default XML set.


OK.

On the other hand, htmlspecialchars converts, at most, just those
five characters to their respective entity references (or decimal
character reference in the case of the ASCII apostrophe, since there
is no entity reference defined for it in any HTML version). The
ENT_QUOTES mode converts both single- and double-quotes; the default
mode, ENT_COMPAT, only converts double-quotes.

http://www.php.net/manual/en/functio...ecialchars.php

--
Jock
Jul 17 '05 #3
I didn't mention SAX, is that the standard PHP parser I'm using now? I
thought it was Expat. Thanks for making this even more confusing ;)

Ok, I'll just dive into DOM now and see where this will all end up. I'll
probably come across all the terms again, in time. B.t.w. I don't understand
much of your XSL note, probably because I know very little about XSL. I'm
using XML to store data while avoiding databases.

Thanks!

"Terence" <tk******@fastmail.fm> schreef in bericht
news:3fb9969e$1@herald...
for a start, if you are "creating" XML content, then you need to use the
DOM API and not the SAX API. As far as I am aware, the SAX API is just
for "reading" XML data and not writing to it. Someone please correct me
if I am wrong.

The DOM API will conveniently do all special character escaping for you
so you dont have to worry about using functions *like* htmlentities().
On that point, basic XML only has 5 pre-defined default entities. And
off the top of my head, I think they are:
> -- &gt;

< -- &lt;
" -- &quot;
& -- &amp;
[insert fifth one here]

The other one escapes me (no pun intended). If you try and use HTML
entities, then you will likely create invalid XML documents because HTML
has entities that are "undefined" in the default XML set.

When you use an XML parser (be it SAX, or DOM) to get the data back from
the XML storage files, everything (including entities) will be converted
back (un-escaped). So you really do not need to use CDATA sections.
CDATA sections do have their usages but their absolute neccecity is
limited to a very few cases.

SPECIAL NOTE ON XSL STYLESHEETS:
If you are using XSL templates to extract HTML markup contained
(escaped) in the XML storage files, use the disable-output-escaping
attribute of the value-of directive to disable output escaping. This is
useful if you have done something like this...
$element->set_content($htmlSource);
and you wish the output tree to contain unescaped HTML.

As for character encoding (UTF8 etc), it depends on what sort of data
you are putting in there. Odds are you needn't concern yourself with
this unless you know that your source data is UTF-16 or something. Just
try using the DOM XML functions and see how you go.

Jul 17 '05 #4
John van Terheijden wrote:
I didn't mention SAX, is that the standard PHP parser I'm using now? I
thought it was Expat. Thanks for making this even more confusing ;)

Yeah, it's a bit like that. I didn't want to include too much
explanations else I'd be in danger of writing a huge article. Trust me,
restraint is a good thing for me. When you're on the newbie end of a
technology, then it's best just to pretend you never read/heard the
stuff that confused you (initially of course).

Simple Api for Xml (SAX) is indeed what PHP's inadequately named the
"XML extension". And yes, it is based on the Expat (product name)
implementation of SAX. SAX is a standard, Expat is a product that
implements that standard.

DOM is a standard, PHP uses the libxml product which implements that
standard. PHP5 is slated to use libxml2 which is very exciting indeed :)

If you don't know anything about XSLT, then ignore the tip I gave to
XSLT users who might take my advice on the [no need to use] CDATA issue.
XSLT is a whole new kettle of fish, don't go there until you have a firm
grasp on XML.

I recomend familiarising yourself with the XML "infoset". You will find
the "infoset" standard on the w3c website. Do not panic, it is a
relatively short document that can be skimmed quite readily. Don't get
depressed if it all doesn't stick the first time. At least *familiarise*
yourself with the *concept* of the infoset. There should be an
introduction/primer type article there.

Ok, I'll just dive into DOM now and see where this will all end up. I'll
probably come across all the terms again, in time. B.t.w. I don't understand
much of your XSL note, probably because I know very little about XSL. I'm
using XML to store data while avoiding databases.

Thanks!

"Terence" <tk******@fastmail.fm> schreef in bericht
news:3fb9969e$1@herald...
for a start, if you are "creating" XML content, then you need to use the
DOM API and not the SAX API. As far as I am aware, the SAX API is just
for "reading" XML data and not writing to it. Someone please correct me
if I am wrong.

The DOM API will conveniently do all special character escaping for you
so you dont have to worry about using functions *like* htmlentities().
On that point, basic XML only has 5 pre-defined default entities. And
off the top of my head, I think they are:
> -- &gt;

< -- &lt;
" -- &quot;
& -- &amp;
[insert fifth one here]

The other one escapes me (no pun intended). If you try and use HTML
entities, then you will likely create invalid XML documents because HTML
has entities that are "undefined" in the default XML set.

When you use an XML parser (be it SAX, or DOM) to get the data back from
the XML storage files, everything (including entities) will be converted
back (un-escaped). So you really do not need to use CDATA sections.
CDATA sections do have their usages but their absolute neccecity is
limited to a very few cases.

SPECIAL NOTE ON XSL STYLESHEETS:
If you are using XSL templates to extract HTML markup contained
(escaped) in the XML storage files, use the disable-output-escaping
attribute of the value-of directive to disable output escaping. This is
useful if you have done something like this...
$element->set_content($htmlSource);
and you wish the output tree to contain unescaped HTML.

As for character encoding (UTF8 etc), it depends on what sort of data
you are putting in there. Odds are you needn't concern yourself with
this unless you know that your source data is UTF-16 or something. Just
try using the DOM XML functions and see how you go.



Jul 17 '05 #5
Thanks for the reply.

"Terence" <tk******@fastmail.fm> schreef in bericht
news:3fbaada8$1@herald...
John van Terheijden wrote:
I didn't mention SAX, is that the standard PHP parser I'm using now? I
thought it was Expat. Thanks for making this even more confusing ;)

Yeah, it's a bit like that. I didn't want to include too much
explanations else I'd be in danger of writing a huge article. Trust me,
restraint is a good thing for me. When you're on the newbie end of a
technology, then it's best just to pretend you never read/heard the
stuff that confused you (initially of course).


I agree. It's always hard to choose between learning by reading or by
practice. Most of the times, "the other one" would have been faster.
Simple Api for Xml (SAX) is indeed what PHP's inadequately named the
"XML extension". And yes, it is based on the Expat (product name)
implementation of SAX. SAX is a standard, Expat is a product that
implements that standard.

DOM is a standard, PHP uses the libxml product which implements that
standard. PHP5 is slated to use libxml2 which is very exciting indeed :)
Thanks for clearing that up! Btw, I think I like how DOM works better than
how SAX works. However, I believe that's very much depending on the type of
XML data involved.
If you don't know anything about XSLT, then ignore the tip I gave to
XSLT users who might take my advice on the [no need to use] CDATA issue.
XSLT is a whole new kettle of fish, don't go there until you have a firm
grasp on XML.
ok :)
I recomend familiarising yourself with the XML "infoset". You will find
the "infoset" standard on the w3c website. Do not panic, it is a
relatively short document that can be skimmed quite readily. Don't get
depressed if it all doesn't stick the first time. At least *familiarise*
yourself with the *concept* of the infoset. There should be an
introduction/primer type article there.


I had a quick look and will read it.

Thanks!
Jul 17 '05 #6
"John van Terheijden" <john-foobar-nl> wrote in message news:<3f***********************@news.versatel.net> ...
Hi.

I'm trying to develop a program that uses XML files store data. I'm using
Windows XP, Apache 1.3.29 and PHP 4.3.4.


I couldn't understand why people are messing with XML when PHP with a
simple database (like MySQL, Postgre SQL or SQLite) can do the job
better.

XML can be effectively used to share the data between two domains.
But, there are people who plow XML in their own domains; also seen
number of people who dump their data into XML from the DB and messing
with XML.

There are also some people who still fight against PHP's cool
short-tag on behalf of messy XML.

---
"One who mix sports and patriotism is a barbarian"
Email: rrjanbiah-at-Y!com
Jul 17 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: alainpoint | last post by:
I am experimenting with ElementTree and i came accross some (apparently) weird behaviour. I would expect a piece of XML to be read, parsed and written back without corruption (except for the...
10
by: Jon Noring | last post by:
Out of curiosity, may a CDATA section appear within an attribute value with datatype CDATA? And if so, how about other attribute value datatypes which accept the XML markup characters? To me,...
4
by: troppfigo | last post by:
I have this example of xml <?xml version="1.0"?> <xml> <!]> </xml> I want to extract the contained data from <body> tag using an xslt transformation. I want to obtain this
10
by: Simon Brooke | last post by:
Here's my problem: <xsl:template match="/category"> .... <script type="text/javascript"> &lt;!]&gt; </script> .... </xsl:template>
3
by: Dilip | last post by:
I have been out of the XML world for a while and have sort of forgotten the exact difference between: <Symbol><!]></Symbol> and just: <Symbol>IBM</Symbol> Can anyone tell me why one is...
12
by: Peter Michaux | last post by:
Hi, I am experimenting with some of the Ruby on Rails JavaScript generators and see something I haven't before. Maybe it is worthwhile? In the page below the script is enclosed in //<!]> ...
1
by: Dariusz Tomoń | last post by:
Hi, I have got xml document with CDATA sections containing special characters like links to images. All Iwant is to display the content in my div section. I tried like this: protected...
3
by: raga | last post by:
Hi Could you please let me know When i specify an attribute of an XML Tag as CDATA in DTD , can i use & straight away within the value of that attribute (instead of using & AMP ;) . If we...
9
by: shapper | last post by:
Hello, Why do some pages I have seen have //<![CDATA[ in the beginning of a script tag before the script itself? Do I need this? Thanks, Miguel
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.