473,395 Members | 1,766 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

loosing data while parsing xml with expat

Hello,

I got a weird problem and need your help and ideas...

I´ve written an php application which imports data in XML format and
writes this data to a MySQL database to have a faster access.

The application uses Expat 1.95.7 via php to render the xml data.

First everything seemed to work fine. But now I noticed that something
goes wrong:

If the ammount of XML data is larger than used for testing the
application, we´re talking about something between 2 and 4 MB, some
data gets lost.

If the structure of the file doesn´t change the lost data is always
the same.

But if I change the structure of the File e.g. by adding a line
somewhere the problem occures on another place.

For Example:

<event>

<SysId>27</SysId>

<ClientId>1</ClientId>

<EventNo>9402</EventNo>

<EventName>Martin Schneider Karben</EventName>

<category>

<Type>Keine Veranstaltungsart</Type>

</category>

.....

</event>

Let´s assume that "Mar" of the data between the <EventName> Tags gets
lost and we get "tin Schneider Karben".

When I insert a Line above the <event> block the "t" from "tin" gets
also lost, so we have "in Schneider Karben".

Why ?

I also tried to dynamically generate parts of the xml data with php:
//--------------- CODE
------------------------------------------------------//
<?php
// num of datasets
$datasets = 2000;
// build the xml string
$str .= '<?xml version="1.0" encoding="ISO-8859-1"?><program
xmlns="http://www.orestes.de">'."\n";
for($i=0; $i<$datasets; $i++){
$str .= '<event>
<SysId>27</SysId>
<ClientId>1</ClientId>
<EventNo>'.$i.'</EventNo>
<EventName>NUM'.$i.'</EventName>
<category>
<Type>Keine Veranstaltungsart</Type>
</category>
<location>
<Name>location_name_'.$i.'</Name>
<Street>Strasse</Street>
<ZIP>32333</ZIP>
<City>City</City>
<Country></Country>
</location>
<Currency>EUR</Currency>
<show>
<ShowNo>1</ShowNo>
<ShowDate>31.12.2004</ShowDate>
<ShowTime>20:00</ShowTime>
<ShowWeekday>Freitag</ShowWeekday>
<ShowPage href="32160001.jsp">TPP Gutscheine</ShowPage>
<Info></Info>
<block number="0">
<FreeSeats>61</FreeSeats>
</block>
</show>
</event>';
}
$str .= "</program>";
// write the data to file
$fp = fopen("../DATA/elektra.xml","w");
fputs($fp, $str);
fclose($fp);
?>
//--------------- CODE END
--------------------------------------------------//

with this generated file NUM1644 becomes 1644 and NUM1195 becomes 5.
All other data is parsed correctly ?!?!
Here the Code of the two Classes used for parsing and importing:
//--------------- CODE
------------------------------------------------------//
<?php
require_once "DB.php";

class ElektraImporter
{
var $FileHash;
var $DAO;
var $XMLDataFile;

function ElektraImporter(){
$this->XMLDataFile = Config::getAttribute("Config/Config_Base",
"elektra_xml");

$DB = DB::connect(Config::getAttribute("Config/Config_Base",
"dsn"));
if(DB::isError($DB)){die(DB::ErrorMessage($DB));}
$this->DAO = Loader::buildObject("XML/ElektraDAO", null, $DB);
}
/**
* checks for changes on the elektra xml data.
* If there are changes the database will be refreshed
*/
function checkForUpdate(){
/* if there are changes */
if($this->_hasElektraFileChanged($this->DAO->getElektraFileHashCode())){
/* read the file and update the database */
$this->DAO->updateElektraData($this->_getElektraData());
} else {
/* everything is o.k. */
}
}
/**
* parse the xml file and get the needed data
* @return array $data
*/
function _getElektraData(){
$Parser = &Loader::buildObject("XML/ElektraParser", null,
array(&$arr));
if( PEAR::isError($Parser) ){
die (PEAR::errorMsg($Parser));
}
$Parser->setInputFile($this->XMLDataFile);
if(PEAR::isError($Parser)){ die($Parser->getMessage()); }

$data = $Parser->getXMLData();

$data['filehash'] = md5_file($this->XMLDataFile);

return $data;
}
/**
* checks if the file has changed
* @return boolean
*/
function _hasElektraFileChanged($filehash = ""){
$this->FileHash = md5_file($this->XMLDataFile);

if($filehash == $this->FileHash){
return false;
} else {
return true;
}
}
}
?>
//--------------- CODE END
--------------------------------------------------//
The Parser Class extending the PEAR::XML_Parser
//--------------- CODE
------------------------------------------------------//
<?php
require_once "XML/Parser.php";
class ElektraParser extends XML_Parser
{
var $XMLData;
var $EventNo;
var $EventName;
var $LastEventNo;
var $ActualEventNo;
var $EventCnt = 0;
var $ShowCnt = 0;

function ElektraParser(&$arr){
$this->XMLData = &$arr;
$this->XML_Parser("ISO-8859-1", "event", "ISO-8859-1");
}

function startHandler($xp, $element, $attribs) {
$this->Element = $element;
$this->Attribs = $attribs;
}

function endHandler($xp, $element) {
if ( $element == "EVENT" ){
/* increase event counter */
$this->EventCnt++;
/* set show counter to 0 */
$this->ShowCnt = 0;
}
elseif ( $element == "SHOW" ){
/* increase show counter for the next show */
$this->ShowCnt ++;
}
$this->Element = "";
}

function cdataHandler($xp, $cdata) {
if($this->Element == "DATE"){
$this->XMLData['creationdate'] = $cdata;
}
elseif($this->Element == "TIME"){
$this->XMLData['creationtime'] = $cdata;
}
/* every event has a sysid the sysid and the eventno make the unique
eventid */
elseif($this->Element == "SYSID"){
$this->XMLData['event'][$this->EventCnt]['sysid'] = $cdata;
}
elseif($this->Element == "CLIENTID"){
$this->XMLData['event'][$this->EventCnt]['clientid'] = $cdata;
}
elseif($this->Element == "EVENTNO"){
$this->XMLData['event'][$this->EventCnt]['eventno'] = $cdata;
}
elseif($this->Element == "EVENTNAME"){
$this->XMLData['event'][$this->EventCnt]['eventname'] = $cdata;
}
elseif($this->Element == "NAME"){
$this->XMLData['event'][$this->EventCnt]['location'] = $cdata;
}
elseif($this->Element == "CITY"){
$this->XMLData['event'][$this->EventCnt]['city'] = $cdata;

/* eventgroups */
/* get the position of the first occurence of the city in the
eventname */
$pos = strpos($this->XMLData['event'][$this->EventCnt]['eventname'],
$cdata);
/* if there´s the city in the name */
if( $pos ){
$this->XMLData['event'][$this->EventCnt]['group'] =
trim(substr($this->XMLData['event'][$this->EventCnt]['eventname'], 0,
$pos));
}
/* otherwise we take the whole eventname as group */
else {
$this->XMLData['event'][$this->EventCnt]['group'] =
trim($this->XMLData['event'][$this->EventCnt]['eventname']);
}
}
/* get the shows */
elseif($this->Element == "SHOWNO") {
$this->XMLData['event'][$this->EventCnt]['show'][$this->ShowCnt]['showno']
= $cdata;
}
elseif($this->Element == "SHOWDATE") {
$this->XMLData['event'][$this->EventCnt]['show'][$this->ShowCnt]['showdate']
= $cdata;
}
elseif($this->Element == "SHOWTIME") {
$this->XMLData['event'][$this->EventCnt]['show'][$this->ShowCnt]['showtime']
= $cdata;
}
elseif($this->Element == "SHOWPAGE"){
$this->XMLData['event'][$this->EventCnt]['show'][$this->ShowCnt]['showpage']
= $this->Attribs['HREF'];
}
}
function defaultHandler($xp, $cdata) {

}
function &getXMLData(){
$p = $this->parse();
if(PEAR::isError($p)){ die($p->getMessage()); }
return $this->XMLData;
}
}
?>
//--------------- CODE END
--------------------------------------------------//

This Problem is real bad because eventIDs have been stripped as well
and then my SQL Statements didn´t work anymore !!!

I have no idea what the reason is or even might be =(
a bug in Expat ?!? ... i can´t really believe
bad formatted XML ? ... not really !?!
problems with expats memory management ?!?
or just my fault? ... where ?

But it seems that the problem is coupled to the format of the xml
file.
If i take out linebreaks or add lines the error occures on other
places !?!
But the same structure always produces the same errors ?!?
My XML skills are not that good so I would be very pleased if you have
an idea or an advice for me.

Thanks for your advice.

With best regards

Fabian Krüger
Jul 20 '05 #1
0 1735

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Ralph Krausse | last post by:
I am a novice with PHP, in fact so novice, that I come from the Windows world. I have successfully set up my Slackware distro on my laptop, downloaded, compiled and installed Apache 2.x and PHP...
4
by: Kyle Root | last post by:
I'm writing a practice program that parses an xml config file, at the moment I'm using expat. However I don't like some of it's behaviors. For example ParseFile() will parse the *entire* file and...
22
by: Willem Ligtenberg | last post by:
I want to parse a very large (2.4 gig) XML file (bioinformatics ofcourse :)) But I have no clue how to do that. Most things I see read the entire xml file at once. That isn't going to work here...
3
by: yaffa | last post by:
does anyone have sample code for parsting an html file to get contents of a td field to write to a mysql db? even if you have everything but the mysql db part ill take it. thanks yaffa
2
by: Colin McKinnon | last post by:
Hi all, I find myaelf writing a template system (yeah, I know - but there is a reason I'm not using an existing one). So I'm trying to parse xhtml using the builtin expat parser. Mostly it works...
1
by: ben | last post by:
Hi, I'm coding an open project. The configuration is stored in an XML file. Should I, 1. parse this XML file with the PHP XML parsing expat library 2. parse this XML file with an Pear...
24
by: Marc Dubois | last post by:
hi, is it possible to parse an XML file in C so that i can fulfill these requirements : 1) replace all "<" and ">" signs inside the body of tag by a space, e.g. : Example 1: <fooblabla < bla...
2
by: sherihan2007 | last post by:
Hi while am running perl script which parses an XML file in AIX following error is getting:(i have given use XML::parser in the script) Can't load...
0
by: Fredrik Lundh | last post by:
Gerth, William D wrote: .... your XML file is broken (note that it's the "parse" function that throws this error). </F>
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.