problem parsing XML files with < and > in cdata-section

wenke

Hi,

I am using the following code (see below) from php.net
(http://www.php.net/manual/en/ref.xml.php, example 1) to parse an XML
file (encoded in UTF-8). I changed the code slightly so that the cdata
sections will be echoed an not the element names as in the original
example.

In the cdata sections of my XML file I have terms like this:

Cap<Finanzinstrument>

The parser echoes them as following (echo $data . " ";):

Cap
<
Finanzinstrument

Can anyone explain this to me? Why does the parser split the
cdata-section with < and &gt, in it? Is there any way to avoid
this?

Thanks very much in advance,

greetings, wenke

--------------------------------------------

<?php
$file = "ck_bsp.xml";
$depth = array();

function startElement($parser, $name, $attrs)
{
global $depth;
for ($i = 0; $i < $depth[$parser]; $i++) {
echo " ";
}
//echo "$name\n";
$depth[$parser]++;
}

function endElement($parser, $name)
{
global $depth;
$depth[$parser]--;
}

function characterData($parser, $data)
{
echo $data . " ";
}

$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");
if (!($fp = fopen($file, "r"))) {
die("could not open XML input");
}

while ($data = fread($fp, 4096)) {
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
}
}
xml_parser_free($xml_parser);
?>
--------------------------------------------

Jul 17 '05 #1

Subscribe Post Reply

5975

Eric Bohlman

we*********@gmx.de (wenke) wrote in
news:a6**************************@posting.google.c om:

In the cdata sections of my XML file I have terms like this:

Cap<Finanzinstrument>

The parser echoes them as following (echo $data . " ";):

Cap
<
Finanzinstrument

Can anyone explain this to me? Why does the parser split the
cdata-section with < and &gt, in it? Is there any way to avoid
this?

"Stream-oriented" XML parsers (like expat, which is what PHP uses) are
almost never guaranteed to return maximum-length pieces of character data,
because doing so requires some rather complicated internal buffering that
slows them down. In particular, they usually stop a chunk at the
beginning of an entity reference. You simply have to be prepared for
consecutive calls to your character data handler.

Jul 17 '05 #2

wenke

Eric Bohlman <eb******@earthlink.net> wrote in message news:<Xn*******************************@130.133.1. 4>...

we*********@gmx.de (wenke) wrote in
news:a6**************************@posting.google.c om:
In the cdata sections of my XML file I have terms like this:

Cap<Finanzinstrument>

The parser echoes them as following (echo $data . " ";):

Cap
<
Finanzinstrument

Can anyone explain this to me? Why does the parser split the
cdata-section with < and &gt, in it? Is there any way to avoid
this?

"Stream-oriented" XML parsers (like expat, which is what PHP uses) are
almost never guaranteed to return maximum-length pieces of character data,
because doing so requires some rather complicated internal buffering that
slows them down. In particular, they usually stop a chunk at the
beginning of an entity reference. You simply have to be prepared for
consecutive calls to your character data handler.

Could you please render this more precisely? How do I know if the
output the parser is giving me still belongs to the prior or a new
cdata section (especially if the structure of the data might vary) ??
Thanks!

Jul 17 '05 #3

Eric Bohlman

we*********@gmx.de (wenke) wrote in
news:a6**************************@posting.google.c om:

Eric Bohlman <eb******@earthlink.net> wrote in message
news:<Xn*******************************@130.133.1. 4>...
"Stream-oriented" XML parsers (like expat, which is what PHP uses)
are almost never guaranteed to return maximum-length pieces of
character data, because doing so requires some rather complicated
internal buffering that slows them down. In particular, they usually
stop a chunk at the beginning of an entity reference. You simply
have to be prepared for consecutive calls to your character data
handler.

Could you please render this more precisely? How do I know if the
output the parser is giving me still belongs to the prior or a new
cdata section (especially if the structure of the data might vary) ??

If there were no intervening start-element or end-element events, then two
character-data events are referring to consecutive parts of the same text.
The usual trick is to clear out a text buffer at the end of the code for
each start-element or end-element event (the code would have made use of
anything that was previously in the buffer), and simply append the text to
it in character-data events.

Jul 17 '05 #4

by: Beznas | last post by:

Hi All; I'm trying to create an ASP function called CleanX that removes the punctuation and some characters like (*&^%$#@!<>?"}|{..) from a text string I came up with this but It...

ASP / Active Server Pages

XSL <> converted into < and > in final HTML

by: Donald Firesmith | last post by:

I am having trouble having Google Adsense code stored in XSL converted properly into HTML. The <> unfortunately become < and > and then no longer work. XSL code is: <script...

.NET Framework

How can I get attribute values to not get converted from ' to ' or & to & or < to > ?

by: Sammy | last post by:

Hi, my mind is going crazy. I have tried everything I can think of to no avail. I have tried Disable Output Escaping. I tried to think of a way of enclosing the attribute data in a CDATA...

.NET Framework

How to display < and > in textarea

by: Francesco Moi | last post by:

Hello. I designed a form to edit some DataBase's fields. But some of these fields contain '<' and '>' characters. And these characters are '<' and '>' in HTML. So if want to edit these...

HTML / CSS

xerces serializing <

by: shaun roe | last post by:

a follow up with new problems from my previous post: I have xml encoded in a string with elements like <myElement/> e.g <codeFragment> <myElement>some text</myElement> </codeFragment> I...

.NET Framework

XMLTextReader - Issue with special characters &,<,>

by: RJN | last post by:

Hi I'm using XMLTextReader to parse the contents of XML. I have issues when the xml content itself has some special characters like & ,> etc. <CompanyName>Johnson & Jhonson</CompanyName>...

.NET Framework

Bound Data problem < >

by: JezB | last post by:

I'm binding a DataGrid web-control to data fetched from a database. However some of my data fields contain text that is within <...> characters - I notice that everything between the <> is...

ASP.NET

Problem: <%= %> is not evaluated..

by: ruskie | last post by:

I created a user control with two public properties and drop it to an aspx page as the following <uc1:myUserControl id="myUserControl1" runat="server" MyProperty1="<%= this.MyVar1 %>"...

ASP.NET

XMLTextReader - Issue with sepcial characters &,<,>

by: RJN | last post by:

Hi I'm using XMLTextReader to parse the contents of XML. I have issues when the xml content itself has some special characters like & ,> etc. <CompanyName>Johnson & Jhonson</CompanyName>...

Visual Basic .NET

Can XSLT render the content: <p&gt as html rather than text?

by: mark4asp | last post by:

I have an element, report which contains tags which have been transformed. E.g. <pis <p&gt <myXml> <report>This text has html tags in it.<p&gt which but <has been changed to <&gt</report>...

.NET Framework

Cloud Servers without Credit Card and Email Registration: A Simpler Way to Get on the Cloud

by: CloudSolutions | last post by:

Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...

General

Wordpress or something else?

by: Faith0G | last post by:

I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

Content Management Systems

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

problem parsing XML files with &lt; and &gt; in cdata-section

Similar topics

problem parsing XML files with < and > in cdata-section