473,382 Members | 1,766 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,382 software developers and data experts.

problem parsing XML files with < and > in cdata-section

Hi,

I am using the following code (see below) from php.net
(http://www.php.net/manual/en/ref.xml.php, example 1) to parse an XML
file (encoded in UTF-8). I changed the code slightly so that the cdata
sections will be echoed an not the element names as in the original
example.

In the cdata sections of my XML file I have terms like this:

Cap<Finanzinstrument>

The parser echoes them as following (echo $data . "<br>";):

Cap
<
Finanzinstrument


Can anyone explain this to me? Why does the parser split the
cdata-section with &lt; and &gt, in it? Is there any way to avoid
this?

Thanks very much in advance,

greetings, wenke

--------------------------------------------

<?php
$file = "ck_bsp.xml";
$depth = array();

function startElement($parser, $name, $attrs)
{
global $depth;
for ($i = 0; $i < $depth[$parser]; $i++) {
echo " ";
}
//echo "$name\n";
$depth[$parser]++;
}

function endElement($parser, $name)
{
global $depth;
$depth[$parser]--;
}

function characterData($parser, $data)
{
echo $data . "<br>";
}

$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");
if (!($fp = fopen($file, "r"))) {
die("could not open XML input");
}

while ($data = fread($fp, 4096)) {
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
}
}
xml_parser_free($xml_parser);
?>
--------------------------------------------
Jul 17 '05 #1
3 5975
we*********@gmx.de (wenke) wrote in
news:a6**************************@posting.google.c om:
In the cdata sections of my XML file I have terms like this:

Cap&lt;Finanzinstrument&gt;

The parser echoes them as following (echo $data . "<br>";):

Cap
<
Finanzinstrument


Can anyone explain this to me? Why does the parser split the
cdata-section with &lt; and &gt, in it? Is there any way to avoid
this?


"Stream-oriented" XML parsers (like expat, which is what PHP uses) are
almost never guaranteed to return maximum-length pieces of character data,
because doing so requires some rather complicated internal buffering that
slows them down. In particular, they usually stop a chunk at the
beginning of an entity reference. You simply have to be prepared for
consecutive calls to your character data handler.
Jul 17 '05 #2
Eric Bohlman <eb******@earthlink.net> wrote in message news:<Xn*******************************@130.133.1. 4>...
we*********@gmx.de (wenke) wrote in
news:a6**************************@posting.google.c om:
In the cdata sections of my XML file I have terms like this:

Cap&lt;Finanzinstrument&gt;

The parser echoes them as following (echo $data . "<br>";):

Cap
<
Finanzinstrument


Can anyone explain this to me? Why does the parser split the
cdata-section with &lt; and &gt, in it? Is there any way to avoid
this?


"Stream-oriented" XML parsers (like expat, which is what PHP uses) are
almost never guaranteed to return maximum-length pieces of character data,
because doing so requires some rather complicated internal buffering that
slows them down. In particular, they usually stop a chunk at the
beginning of an entity reference. You simply have to be prepared for
consecutive calls to your character data handler.


Could you please render this more precisely? How do I know if the
output the parser is giving me still belongs to the prior or a new
cdata section (especially if the structure of the data might vary) ??
Thanks!
Jul 17 '05 #3
we*********@gmx.de (wenke) wrote in
news:a6**************************@posting.google.c om:
Eric Bohlman <eb******@earthlink.net> wrote in message
news:<Xn*******************************@130.133.1. 4>...
"Stream-oriented" XML parsers (like expat, which is what PHP uses)
are almost never guaranteed to return maximum-length pieces of
character data, because doing so requires some rather complicated
internal buffering that slows them down. In particular, they usually
stop a chunk at the beginning of an entity reference. You simply
have to be prepared for consecutive calls to your character data
handler.


Could you please render this more precisely? How do I know if the
output the parser is giving me still belongs to the prior or a new
cdata section (especially if the structure of the data might vary) ??


If there were no intervening start-element or end-element events, then two
character-data events are referring to consecutive parts of the same text.
The usual trick is to clear out a text buffer at the end of the code for
each start-element or end-element event (the code would have made use of
anything that was previously in the buffer), and simply append the text to
it in character-data events.
Jul 17 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Beznas | last post by:
Hi All; I'm trying to create an ASP function called CleanX that removes the punctuation and some characters like (*&^%$#@!<>?"}|{..) from a text string I came up with this but It...
2
by: Donald Firesmith | last post by:
I am having trouble having Google Adsense code stored in XSL converted properly into HTML. The <> unfortunately become &lt; and &gt; and then no longer work. XSL code is: <script...
12
by: Sammy | last post by:
Hi, my mind is going crazy. I have tried everything I can think of to no avail. I have tried Disable Output Escaping. I tried to think of a way of enclosing the attribute data in a CDATA...
2
by: Francesco Moi | last post by:
Hello. I designed a form to edit some DataBase's fields. But some of these fields contain '&lt;' and '&gt;' characters. And these characters are '<' and '>' in HTML. So if want to edit these...
3
by: shaun roe | last post by:
a follow up with new problems from my previous post: I have xml encoded in a string with elements like &lt;myElement/&gt; e.g <codeFragment> &lt;myElement&gt;some text&lt;/myElement&gt; </codeFragment> I...
1
by: RJN | last post by:
Hi I'm using XMLTextReader to parse the contents of XML. I have issues when the xml content itself has some special characters like & ,> etc. <CompanyName>Johnson & Jhonson</CompanyName>...
1
by: JezB | last post by:
I'm binding a DataGrid web-control to data fetched from a database. However some of my data fields contain text that is within <...> characters - I notice that everything between the <> is...
3
by: ruskie | last post by:
I created a user control with two public properties and drop it to an aspx page as the following <uc1:myUserControl id="myUserControl1" runat="server" MyProperty1="<%= this.MyVar1 %>"...
1
by: RJN | last post by:
Hi I'm using XMLTextReader to parse the contents of XML. I have issues when the xml content itself has some special characters like & ,> etc. <CompanyName>Johnson & Jhonson</CompanyName>...
4
by: mark4asp | last post by:
I have an element, report which contains tags which have been transformed. E.g. <pis &lt;p&gt <myXml> <report>This text has html tags in it.&lt;p&gt which but <has been changed to &lt;&gt</report>...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.