I have created a simple SAX parser for a very simple XML file.
When I run the code that follows I get this error:
"Invalid hexadecimal character reference"
The strange thing is If I change the "chunk size" for the data I send
to the parser, the error row changes. This behaviour is very strange!
I have done a one more test and I have set the chunkSize equals to the
file size and I have the same error at the end of the file.
The same XML file processed with another language doesn't raise any
error.
I use PHP 5.2.3 and a LAMP (AppServ Open Project - 2.5.9 for Windows)
on
a Windows VISTA PC.
The code I have used follows:
public function create_parser($ filename)
{
$this->fp = fopen($filename , 'r');
$this->fsize = filesize($filen ame);
$this->parser = xml_parser_crea te();
xml_set_element _handler($this->parser,
'Parser::start_ element','Parse r::end_element' );
xml_set_charact er_data_handler ($this->parser, 'Parser::char_d ata');
}
public function parse()
{
//$blockSize = 4*1024;
$blockSize = $this->fsize; echo 'Lunghezza file: '.$this-
>fsize;while ($data = fread($this->fp, $blockSize))
{
//$data = str_replace('\n ','',$data);
if (!xml_parse($th is->parser, $data, feof($this->fp)))
{
echo 'Parser error: ('.xml_get_curr ent_byte_index( $this-
>parser).')\''.xml_error_s tring($this->parser).'\' at line '.
xml_get_current _line_number($t his->parser). ' at col ' .
xml_get_current _column_number( $this->parser);
return false;
}
}
return true;
}
A piece of the XML following:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE dblp SYSTEM "dblp.dtd">
<dblp>
<incollection mdate="2002-01-03"
key="books/acm/kim95/AnnevelinkACFHK 95">
<author>
Jurgen Annevelink
</author>
<author>
Rafiul Ahad
</author>
<author>
Amelia Carlson
</author>
<author>
Daniel H. Fishman
</author>
<author>
Michael L. Heytens
</author>
<author>
.....
The Industrial Information Technology Handbook
</booktitle>
<url>
db/books/collections/IITHandbook2005 .html#SeyfarthK 05
</url>
</incollection>
</dblp>