472,110 Members | 2,224 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,110 software developers and data experts.

Using Xerces SAX to parse just part of an input stream?

I'm trying to put together code to deal with a SOAP with attachements
response, and I'd like to process the response in a single pass. The
SOAP with attachments specification returns XML in a MIME message, so
it looks like this:

--4389012.48390
Content-Type: text/xml

<?xml version="1.0" encoding="UTF-8"?>
<soap-env:Envelope
xmlns:soap-env="http://schemas.xmlsoap.org/soap/envelope/">
....snip...
</soap-env:Envelope>
--4389012.48390
Content-Type: text/xml
Content-Id: RootNode

<?xml version="1.0" encoding="UTF-8"?><RootNode>
... snip ...
</RootNode>
--4389012.48390--

So what I'd LIKE to be able to do is to parse the incoming input stream
up to the <?xml> declaration, hand the input stream over to a SAX
parser, let it parse to the end of the document, and then have it
return at the end so I can continue parsing the same input stream.

The problem is that "SAXParser.parse( new InputSource( inputStream ),
handler );" appears to want to consume the input stream until it
reaches EOF on the input stream (which, when given the input stream
above, fails with the error message "Content is not allowed in trailing
section."). Is this something I can work around in Xerces, or is there
a better SAX implementation that will let me tell the parser to stop
when it reaches the last element?

May 10 '06 #1
3 3023
Nobody wrote:
The problem is that "SAXParser.parse( new InputSource( inputStream ),
handler );" appears to want to consume the input stream until it
reaches EOF on the input stream (which, when given the input stream
above, fails with the error message "Content is not allowed in trailing
section.").


Unfortunately, the definition of XML parsing does say that there
shouldn't be anything following the document element.

Possible solution: Create a stream filter which you pass the
"--4389012.48390" at the start of the enclosed message, and which
delivers characters only until it sees the corresponding
"--4389012.48390" mark at the end, returning EOF thereafter. Run the
parser from that filter-stream rather than direct from your original
input stream.

In other words, sweep the issue under the carpet so the parser doesn't
have to see it.
May 10 '06 #2
Thanks - that was pretty much what I've come up with, although I was
hoping for something simpler. Of course, it doesn't look like writing
a SAX parser is all THAT hard...

May 10 '06 #3
Nobody wrote:
Thanks - that was pretty much what I've come up with, although I was
hoping for something simpler. Of course, it doesn't look like writing
a SAX parser is all THAT hard...


XML 1.0 was designed with the goal that writing a parser should be about
the right size for a student project.

Of course that's before namespaces, and schemas, and other things were
added to the mix.

Experience has shown that this is very much a 90/10 problem. You can get
90% of the behavior for 10% of the effort; the other 10% takes the other
90% (or more) of the effort. And making it perform well can add yet
another 90%...
--
Joe Kesselman / Beware the fury of a patient man. -- John Dryden
May 10 '06 #4

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

7 posts views Thread by Ganesh Gella | last post: by
3 posts views Thread by Girish | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.