By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
457,876 Members | 1,542 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 457,876 IT Pros & Developers. It's quick & easy.

"Invalid hexadecimal character reference" error parsing an XML withSAX processor

P: n/a
Hy to everyone

I have created a simple SAX parser for a very simple XML file.

When I run the code that follows I get this error:

"Invalid hexadecimal character reference"

The strange thing is If I change the "chunk size" for the data I send
to the parser, the error row changes. This behaviour is very strange!

I have done a one more test and I have set the chunkSize equals to the
file size and I have the same error at the end of the file.

The same XML file processed with another language doesn't raise any
error.

I use PHP 5.2.3 and a LAMP (AppServ Open Project - 2.5.9 for Windows)
on
a Windows VISTA PC.

The code I have used follows:

public function create_parser($filename)
{
$this->fp = fopen($filename, 'r');
$this->fsize = filesize($filename);
$this->parser = xml_parser_create();
xml_set_element_handler($this->parser,
'Parser::start_element','Parser::end_element');
xml_set_character_data_handler($this->parser, 'Parser::char_data');
}
public function parse()
{
//$blockSize = 4*1024;
$blockSize = $this->fsize; echo 'Lunghezza file: '.$this-
>fsize;
while ($data = fread($this->fp, $blockSize))
{
//$data = str_replace('\n','',$data);
if (!xml_parse($this->parser, $data, feof($this->fp)))
{
echo 'Parser error: ('.xml_get_current_byte_index($this-
>parser).')
\''.xml_error_string($this->parser).'\' at line '.
xml_get_current_line_number($this->parser). ' at col ' .
xml_get_current_column_number($this->parser);
return false;
}
}
return true;
}
A piece of the XML following:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE dblp SYSTEM "dblp.dtd">
<dblp>
<incollection mdate="2002-01-03"
key="books/acm/kim95/AnnevelinkACFHK95">
<author>
Jurgen Annevelink
</author>
<author>
Rafiul Ahad
</author>
<author>
Amelia Carlson
</author>
<author>
Daniel H. Fishman
</author>
<author>
Michael L. Heytens
</author>
<author>

.....
The Industrial Information Technology Handbook
</booktitle>
<url>
db/books/collections/IITHandbook2005.html#SeyfarthK05
</url>
</incollection>
</dblp>

Aug 5 '08 #1
Share this question for a faster answer!
Share on Google+

This discussion thread is closed

Replies have been disabled for this discussion.