471,123 Members | 893 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,123 software developers and data experts.

SAX multiple calls to characters()

I am using the function listed below to handle characters events in
SAX. It does not handle multiple sequential calls to this function
correctly. For example, I am getting
"2 4 816 32 64" as a value for an element when processing <vec2 4 8
16 32 64 </vec>
because I am getting 2 calls to process the text in this element, one
for "2 4 8" and the other for "16 32 64". I have tried appending a
blank to the result after each call to this function, but that
sometimes splits numbers or words, depending on what is passed to this
function.

Is there a better way of handling multiple characters events? Thanks

public void characters(char[] chars, int start, int length) {
while ( (length 0) && Character.isWhitespace(chars[start]) )
{
++start;
--length;
}
while ( (length 0) &&
Character.isWhitespace(chars[start+length-1]) ) {
--length;
}
if ( length 0 ) {
_text += new String(chars,start,length);
}
}

Aug 2 '06 #1
2 1394
* me*****@rsn.hp.com wrote in comp.text.xml:
>I am using the function listed below to handle characters events in
SAX. It does not handle multiple sequential calls to this function
correctly.
Then you need to change that. It is normal for SAX processors to call
the characters() callback multiple times, you have to design your code
so it can handle that. One option here is to simply buffer the data and
process it when all data has been accumulated (e.g., when the endElement
callback is called).
--
Björn Höhrmann · mailto:bj****@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Aug 3 '06 #2
Le 02-08-2006, me*****@rsn.hp.com <me*****@rsn.hp.coma écrit*:
I am using the function listed below to handle characters events in
SAX. It does not handle multiple sequential calls to this function
correctly.
For example, I am getting
"2 4 816 32 64" as a value for an element when processing <vec2 4 8
16 32 64 </vec>
because I am getting 2 calls to process the text in this element, one
for "2 4 8" and the other for "16 32 64".
The expected behavior of a SAX API preserve all characters, whitespaces
included, of the input document. Your problem is either in your code, either
(less probably ;-)) in the SAX implementation you used.
I have tried appending a
blank to the result after each call to this function, but that
sometimes splits numbers or words, depending on what is passed to this
function.
I think this is definitly not the good solution :-)
Is there a better way of handling multiple characters events? Thanks
public void characters(char[] chars, int start, int length) {
while ( (length 0) && Character.isWhitespace(chars[start]) )
{
++start;
--length;
}
while ( (length 0) &&
Character.isWhitespace(chars[start+length-1]) ) {
--length;
}
This is this piece of code, as far as I understand, which is responsible for
the behaviour you complain about! You remove the trailling whitespace
characters in the characters chunks you receive, so how can you expect to see
the whitespace characters in the outputed string?
if ( length 0 ) {
_text += new String(chars,start,length);
}
}
Concatenation is dangerous for performance. You may consider using a
StringBuffer (sb.append(chars, start, length)).
Aug 5 '06 #3

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

32 posts views Thread by tshad | last post: by
5 posts views Thread by Mark Fox | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.