473,386 Members | 1,679 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

SAX parsing problem

So I've encountered a strange behavior that I'm hoping someone can fill
me in on. i've written a simple handler that works with one small
exception, when the parser encounters a line with '&' in it, it
only returns the portion that follows the occurence.

For example, parsing a file with the line :
<key>mykey</key><value>some%20&%20value</value>

results in getting "%20value" back from the characters method, rather
than "some%20&%20value".

After looking into this a bit, I found that SAX supports entities and
that it is probably believing the & to be an entity and processing
it in some way that i'm unware of. I'm using the default
EntityResolver.

Any help/info would be much appreciated.

gh
Jul 18 '05 #1
3 1355
anon <an**@anon.net> writes:
So I've encountered a strange behavior that I'm hoping someone can fill
me in on. i've written a simple handler that works with one small
exception, when the parser encounters a line with '&' in it, it
only returns the portion that follows the occurence.

For example, parsing a file with the line :
<key>mykey</key><value>some%20&%20value</value>

results in getting "%20value" back from the characters method, rather
than "some%20&%20value".

After looking into this a bit, I found that SAX supports entities and
that it is probably believing the & to be an entity and processing
it in some way that i'm unware of. I'm using the default
EntityResolver.


Are you sure you're not actually getting three chunks: "some%20", "&",
and "%20value"? The xml.sax.handler.ContentHandler.characters method
(which I presume you're using for SAX, as you don't mention!) is not
guaranteed to get all contiguous character data in one call. Also check
if .skippedEntity() methods are firing.

--
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke
|cookedm(at)physics(dot)mcmaster(dot)ca
Jul 18 '05 #2
gh
In article <qn*************@arbutus.physics.mcmaster.ca>, David M.
Cooke <co**********@physics.mcmaster.ca> wrote:
anon <an**@anon.net> writes:
So I've encountered a strange behavior that I'm hoping someone can fill
me in on. i've written a simple handler that works with one small
exception, when the parser encounters a line with '&' in it, it
only returns the portion that follows the occurence.

For example, parsing a file with the line :
<key>mykey</key><value>some%20&%20value</value>

results in getting "%20value" back from the characters method, rather
than "some%20&%20value".

After looking into this a bit, I found that SAX supports entities and
that it is probably believing the & to be an entity and processing
it in some way that i'm unware of. I'm using the default
EntityResolver.


Are you sure you're not actually getting three chunks: "some%20", "&",
and "%20value"? The xml.sax.handler.ContentHandler.characters method
(which I presume you're using for SAX, as you don't mention!) is not
guaranteed to get all contiguous character data in one call. Also check
if .skippedEntity() methods are firing.


Ya, skippedEntity() wasn't firing, but you are correct about receiving
three chunks. The characters handler routine is fired 3 times for a
single text block. Why does it do this? Is there a way to prevent
doing this?

Much thanks.

gh
Jul 18 '05 #3
On Wed, 2005-03-16 at 00:14 -0800, gh wrote:
The characters handler routine is fired 3 times for a
single text block. Why does it do this? Is there a way to prevent
doing this?


Continuing in the vein of closing matters cross-posted to XML-SIG:

http://mail.python.org/pipermail/xml...ch/011013.html

--
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://4Suite.org http://fourthought.com
Use CSS to display XML, part 2 - http://www-128.ibm.com/developerwork...xmlcss2-i.html
Introducing the Amara XML Toolkit - http://www.xml.com/pub/a/2005/01/19/amara.html
Gems from the Mines: 2002 to 2003 - http://www.xml.com/pub/a/2005/03/02/pyxml.html
Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286
Querying WordNet as XML - http://www.ibm.com/developerworks/xm...x-think29.html
Packaging XSLT lookup tables as EXSLT functions - http://www.ibm.com/developerworks/xm...-tiplook2.html

Jul 18 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

303
by: mike420 | last post by:
In the context of LATEX, some Pythonista asked what the big successes of Lisp were. I think there were at least three *big* successes. a. orbitz.com web site uses Lisp for algorithms, etc. b....
16
by: Terry | last post by:
Hi, This is a newbie's question. I want to preload 4 images and only when all 4 images has been loaded into browser's cache, I want to start a slideshow() function. If images are not completed...
3
by: uestebanez | last post by:
Hi everybody! I have a little problem working with libxml SAX API. The problem is that I don't know how to stop parsing when I have processed the data I need. I don't want to parse all file, I...
5
by: gamehack | last post by:
Hi all, I was thinking about parsing equations but I can't think of any generic approach. Basically I have a struct called math_term which is something like: struct math_term { char sign; int...
9
by: ankitdesai | last post by:
I would like to parse a couple of tables within an individual player's SHTML page. For example, I would like to get the "Actual Pitching Statistics" and the "Translated Pitching Statistics"...
3
by: toton | last post by:
Hi, I have some ascii files, which are having some formatted text. I want to read some section only from the total file. For that what I am doing is indexing the sections (denoted by .START in...
3
by: Anup Daware | last post by:
Hi Group, I am facing a strange problem here: I am trying to read xml response from a servlet using XmlTextWriter. I am able to read the read half of the xml and suddenly an exception:...
13
by: Chris Carlen | last post by:
Hi: Having completed enough serial driver code for a TMS320F2812 microcontroller to talk to a terminal, I am now trying different approaches to command interpretation. I have a very simple...
1
by: Philip Semanchuk | last post by:
On Oct 12, 2008, at 5:25 AM, S.Selvam Siva wrote: Selvam, You can try to find them yourself using string parsing, but that's difficult. The closer you want to get to "perfect" at finding URLs...
2
by: Felipe De Bene | last post by:
I'm having problems parsing an HTML file with the following syntax : <TABLE cellspacing=0 cellpadding=0 ALIGN=CENTER BORDER=1 width='100%'> <TH BGCOLOR='#c0c0c0' Width='3%'>User ID</TH> <TH...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.