By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
439,931 Members | 2,015 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 439,931 IT Pros & Developers. It's quick & easy.

SAX parsing problem

P: n/a
So I've encountered a strange behavior that I'm hoping someone can fill
me in on. i've written a simple handler that works with one small
exception, when the parser encounters a line with '&' in it, it
only returns the portion that follows the occurence.

For example, parsing a file with the line :
<key>mykey</key><value>some%20&%20value</value>

results in getting "%20value" back from the characters method, rather
than "some%20&%20value".

After looking into this a bit, I found that SAX supports entities and
that it is probably believing the & to be an entity and processing
it in some way that i'm unware of. I'm using the default
EntityResolver.

Any help/info would be much appreciated.

gh
Jul 18 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
anon <an**@anon.net> writes:
So I've encountered a strange behavior that I'm hoping someone can fill
me in on. i've written a simple handler that works with one small
exception, when the parser encounters a line with '&' in it, it
only returns the portion that follows the occurence.

For example, parsing a file with the line :
<key>mykey</key><value>some%20&%20value</value>

results in getting "%20value" back from the characters method, rather
than "some%20&%20value".

After looking into this a bit, I found that SAX supports entities and
that it is probably believing the & to be an entity and processing
it in some way that i'm unware of. I'm using the default
EntityResolver.


Are you sure you're not actually getting three chunks: "some%20", "&",
and "%20value"? The xml.sax.handler.ContentHandler.characters method
(which I presume you're using for SAX, as you don't mention!) is not
guaranteed to get all contiguous character data in one call. Also check
if .skippedEntity() methods are firing.

--
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke
|cookedm(at)physics(dot)mcmaster(dot)ca
Jul 18 '05 #2

P: n/a
gh
In article <qn*************@arbutus.physics.mcmaster.ca>, David M.
Cooke <co**********@physics.mcmaster.ca> wrote:
anon <an**@anon.net> writes:
So I've encountered a strange behavior that I'm hoping someone can fill
me in on. i've written a simple handler that works with one small
exception, when the parser encounters a line with '&' in it, it
only returns the portion that follows the occurence.

For example, parsing a file with the line :
<key>mykey</key><value>some%20&%20value</value>

results in getting "%20value" back from the characters method, rather
than "some%20&%20value".

After looking into this a bit, I found that SAX supports entities and
that it is probably believing the & to be an entity and processing
it in some way that i'm unware of. I'm using the default
EntityResolver.


Are you sure you're not actually getting three chunks: "some%20", "&",
and "%20value"? The xml.sax.handler.ContentHandler.characters method
(which I presume you're using for SAX, as you don't mention!) is not
guaranteed to get all contiguous character data in one call. Also check
if .skippedEntity() methods are firing.


Ya, skippedEntity() wasn't firing, but you are correct about receiving
three chunks. The characters handler routine is fired 3 times for a
single text block. Why does it do this? Is there a way to prevent
doing this?

Much thanks.

gh
Jul 18 '05 #3

P: n/a
On Wed, 2005-03-16 at 00:14 -0800, gh wrote:
The characters handler routine is fired 3 times for a
single text block. Why does it do this? Is there a way to prevent
doing this?


Continuing in the vein of closing matters cross-posted to XML-SIG:

http://mail.python.org/pipermail/xml...ch/011013.html

--
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://4Suite.org http://fourthought.com
Use CSS to display XML, part 2 - http://www-128.ibm.com/developerwork...xmlcss2-i.html
Introducing the Amara XML Toolkit - http://www.xml.com/pub/a/2005/01/19/amara.html
Gems from the Mines: 2002 to 2003 - http://www.xml.com/pub/a/2005/03/02/pyxml.html
Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286
Querying WordNet as XML - http://www.ibm.com/developerworks/xm...x-think29.html
Packaging XSLT lookup tables as EXSLT functions - http://www.ibm.com/developerworks/xm...-tiplook2.html

Jul 18 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.