473,386 Members | 1,864 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

SAX parseing goes 'all funny' on value [en]

Hi,

I am parsing a small xml document and the parseing goes 'all funny'
when parsing this element: <useragent>Mozilla/4.61 [en] (WinNT;
I)</useragent>

I've created a subclass of org.xml.sax.helpers.DefaultHandler, and an
instance of this subclass is set on my
org.apache.xerces.parsers.SAXParser:

SAXParser parser = new SAXParser();
parser.setContentHandler(pdh);
parser.setErrorHandler(pdh);

I've found that the

public void characters(char[] ch, int offset, int length) throws
SAXException

method is called once per element parsed. my debug output confirms
this. e.g. when parsing <useragent>MobileExplorer/3.00 (Mozilla/1.22;
compatible; MMEF300; Microsoft; Windows; GenericLarge)</useragent> it
reads:

D: reading characters...(useragent) length=89, offset=721,
found='MobileExplorer/3.00 (Mozilla/1.22; compatible; MMEF300;
Microsoft; Windows; GenericLarge)'
D: ending element (useragent) current element value is :
[MobileExplorer/3.00 (Mozilla/1.22; compatible; MMEF300; Microsoft;
Windows; GenericLarge)]
But... when parsing <useragent>Mozilla/4.61 [en] (WinNT;
I)</useragent>
the debug output reads

D: reading characters...(useragent) length=16, offset=1097,
found='Mozilla/4.61 [en'
D: reading characters...(useragent) length=1, offset=0, found=']'
D: reading characters...(useragent) length=11, offset=1114, found='
(WinNT; I)'
D: ending (useragent) current element value is : [ (WinNT; I)]

It calls the characters method trice?!
Does the [en] bit in the element value have anything to do with this?
Would like to understand what and why.

(As a 'temp fix' I thought to have the DefaultHandlers characters(...)
method concatenate characters read, till the endElement(...) is
invoked; but that seems to break everything.)

Thanks for your input.
Fred.
Jul 20 '05 #1
2 1801
Fred wrote:
(As a 'temp fix' I thought to have the DefaultHandlers characters(...)
method concatenate characters read, till the endElement(...) is
invoked; but that seems to break everything.)


I think that's how SAX is supposed to work. There's no guarantee that
you're only getting a single event here.
Jul 20 '05 #2
Julian Reschke <ju************@gmx.de> wrote in
news:3F**************@gmx.de:
Fred wrote:
(As a 'temp fix' I thought to have the DefaultHandlers characters(...)
method concatenate characters read, till the endElement(...) is
invoked; but that seems to break everything.)


I think that's how SAX is supposed to work. There's no guarantee that
you're only getting a single event here.


It *is* how SAX is supposed to work. Keep in mind that character data in
XML can be arbitrarily long; if a parser had to deliver character data in a
single chunk, it could find itself constantly allocating and reallocating
buffers. Not imposing such a requirement greatly simplifies buffer
management in a parser; it can use a fixed-size internal buffer and just
call the character handler when everything up to the end of the buffer is
character data, rather than having to shift everything around. That can
greatly speed up parsing.
Jul 20 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

65
by: perseus | last post by:
I think that everyone who told me that my question is irrelevant, in particular Mr. David White, is being absolutely ridiculous. Obviously, most of you up here behave like the owners of the C++...
2
by: atapi103 | last post by:
I have documented programing errors in C++ useing xml document. So I want a way to display this xml document but any xml parser I download complains about "A name was started with an invalid...
3
by: frizzle | last post by:
Hi there, I'm building a multi-language PHP/mySQL -site. I'm also building a CMS for the site. There are 5 languages. In the CMS fields for e.g. english bodytext are called (id=)...
2
by: An S. | last post by:
I have created a little "update" system, that tells when a update from nvidia is released, currently it tells it from a "simple" protocol "S2P", i have been told that XML, is much easier, for the...
4
by: Keith Henderson | last post by:
I have an xml document loaded into a string that I need to parse. below is the first few elements in the xml string. <?xml version="1.0" standalone="yes"?> <xs:schema id="NewDataSet" xmlns=""...
6
by: Brian Henry | last post by:
How would you parse this type of file into an array? "Test","Help, data","hello there, this text has commas","commas seperate data, and in quotes they dont" where the double quotes are the...
6
by: Ray Cassick \(Home\) | last post by:
Ok, what is up here. The 2005 framework contains all kinds of cool new structures now that we have Generics and all but they always seem to fall just short of exactly what I need. In 2003 I...
64
by: yossi.kreinin | last post by:
Hi! There is a system where 0x0 is a valid address, but 0xffffffff isn't. How can null pointers be treated by a compiler (besides the typical "solution" of still using 0x0 for "null")? -...
2
by: JohnIdol | last post by:
Hi All, Don't know where to put the post as it involves javascript and ASP.NET pages, so I'll just put it here. I have a very simple aspx page and and a couple of simple javascript functions to...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.