473,791 Members | 2,816 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

SAX parseing goes 'all funny' on value [en]

Hi,

I am parsing a small xml document and the parseing goes 'all funny'
when parsing this element: <useragent>Mozi lla/4.61 [en] (WinNT;
I)</useragent>

I've created a subclass of org.xml.sax.hel pers.DefaultHan dler, and an
instance of this subclass is set on my
org.apache.xerc es.parsers.SAXP arser:

SAXParser parser = new SAXParser();
parser.setConte ntHandler(pdh);
parser.setError Handler(pdh);

I've found that the

public void characters(char[] ch, int offset, int length) throws
SAXException

method is called once per element parsed. my debug output confirms
this. e.g. when parsing <useragent>Mobi leExplorer/3.00 (Mozilla/1.22;
compatible; MMEF300; Microsoft; Windows; GenericLarge)</useragent> it
reads:

D: reading characters...(u seragent) length=89, offset=721,
found='MobileEx plorer/3.00 (Mozilla/1.22; compatible; MMEF300;
Microsoft; Windows; GenericLarge)'
D: ending element (useragent) current element value is :
[MobileExplorer/3.00 (Mozilla/1.22; compatible; MMEF300; Microsoft;
Windows; GenericLarge)]
But... when parsing <useragent>Mozi lla/4.61 [en] (WinNT;
I)</useragent>
the debug output reads

D: reading characters...(u seragent) length=16, offset=1097,
found='Mozilla/4.61 [en'
D: reading characters...(u seragent) length=1, offset=0, found=']'
D: reading characters...(u seragent) length=11, offset=1114, found='
(WinNT; I)'
D: ending (useragent) current element value is : [ (WinNT; I)]

It calls the characters method trice?!
Does the [en] bit in the element value have anything to do with this?
Would like to understand what and why.

(As a 'temp fix' I thought to have the DefaultHandlers characters(...)
method concatenate characters read, till the endElement(...) is
invoked; but that seems to break everything.)

Thanks for your input.
Fred.
Jul 20 '05 #1
2 1816
Fred wrote:
(As a 'temp fix' I thought to have the DefaultHandlers characters(...)
method concatenate characters read, till the endElement(...) is
invoked; but that seems to break everything.)


I think that's how SAX is supposed to work. There's no guarantee that
you're only getting a single event here.
Jul 20 '05 #2
Julian Reschke <ju************ @gmx.de> wrote in
news:3F******** ******@gmx.de:
Fred wrote:
(As a 'temp fix' I thought to have the DefaultHandlers characters(...)
method concatenate characters read, till the endElement(...) is
invoked; but that seems to break everything.)


I think that's how SAX is supposed to work. There's no guarantee that
you're only getting a single event here.


It *is* how SAX is supposed to work. Keep in mind that character data in
XML can be arbitrarily long; if a parser had to deliver character data in a
single chunk, it could find itself constantly allocating and reallocating
buffers. Not imposing such a requirement greatly simplifies buffer
management in a parser; it can use a fixed-size internal buffer and just
call the character handler when everything up to the end of the buffer is
character data, rather than having to shift everything around. That can
greatly speed up parsing.
Jul 20 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

65
5396
by: perseus | last post by:
I think that everyone who told me that my question is irrelevant, in particular Mr. David White, is being absolutely ridiculous. Obviously, most of you up here behave like the owners of the C++ language. A C++ interface installation IS ABOUT THE C++ LANGUAGE! The language does not possess the ability to handle even simple file directory manipulation. Those wise people that created it did not take care of it. So, BOOST is a portable...
2
1593
by: atapi103 | last post by:
I have documented programing errors in C++ useing xml document. So I want a way to display this xml document but any xml parser I download complains about "A name was started with an invalid character" This is prob because xml doesnt like c++ and Im looking for a solution. This is what the portion of my document looks like: <note> <error> no match for 'operator<<' in 'std::operator<<(std::basic_ostream.....
3
1624
by: frizzle | last post by:
Hi there, I'm building a multi-language PHP/mySQL -site. I'm also building a CMS for the site. There are 5 languages. In the CMS fields for e.g. english bodytext are called (id=) 'bodytext_en'. German looks like 'bodytext_ge', etc. The '_en' part tells us that it's a textfield for english texts.
2
1360
by: An S. | last post by:
I have created a little "update" system, that tells when a update from nvidia is released, currently it tells it from a "simple" protocol "S2P", i have been told that XML, is much easier, for the client to download, and then read it afterwards, sooo does anyone know a good XML parseing libary? thanks FreeGeG
4
1444
by: Keith Henderson | last post by:
I have an xml document loaded into a string that I need to parse. below is the first few elements in the xml string. <?xml version="1.0" standalone="yes"?> <xs:schema id="NewDataSet" xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata"> <xs:element name="NewDataSet" msdata:IsDataSet="true"> <xs:complexType> <xs:choice minOccurs="0" maxOccurs="unbounded">
6
1231
by: Brian Henry | last post by:
How would you parse this type of file into an array? "Test","Help, data","hello there, this text has commas","commas seperate data, and in quotes they dont" where the double quotes are the string markers, which can contain comma's, but when you are not in a quote block the commas seperate the data... you're basic Comma seperated value file... thanks (i know split doesnt work for this... thats why im asking)
6
1125
by: Ray Cassick \(Home\) | last post by:
Ok, what is up here. The 2005 framework contains all kinds of cool new structures now that we have Generics and all but they always seem to fall just short of exactly what I need. In 2003 I needed a sets construct and they did not have it so I had to create one. I am not sure yet if they have one in 2005 yet BTW... I was dealing with the Dictionary last night and thought it would be a great
64
3956
by: yossi.kreinin | last post by:
Hi! There is a system where 0x0 is a valid address, but 0xffffffff isn't. How can null pointers be treated by a compiler (besides the typical "solution" of still using 0x0 for "null")? - AFAIK C allows "null pointers" to be represented differently then "all bits 0". Is this correct? - AFAIK I can't `#define NULL 0x10000' since `void* p=0;' should work just like `void* p=NULL'. Is this correct?
2
1283
by: JohnIdol | last post by:
Hi All, Don't know where to put the post as it involves javascript and ASP.NET pages, so I'll just put it here. I have a very simple aspx page and and a couple of simple javascript functions to move elements around the page (up and down). If I execute js functions from within the form I got Microsoft JScript runtime error: Object required, If I do that from outside the form everything works the way it's supposed to. Any Idea?
0
9669
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10426
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10207
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9029
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7537
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6776
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5558
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4109
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3713
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.