473,412 Members | 4,127 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,412 software developers and data experts.

xerces advanced usage - progresss, random access etc

Kza
Hi, I am currently using xerces sax parser for c++, (I use DOM too, but
I think SAX is more relevant here) for processing and displaying fairly
large xml files. Usually I give xerces a filename, and it parses it and
thats all good. But the customer needs more features.

Feature 1: A progress display. I have tried a few times now to find a
way of asking xerces how far through a file it is in bytes, but no
luck. (I did try a per element check, but that involves a whole extra
parse at the start just to count the elements). I have tried using the
LocalFileInputSource, and getting its BinInputStream and calling itc
curPos, but its always 0.

Any ideas?

Feature 2: Loading only a "screenful" of the file at a time. I also
would like some sort of random access functionality, so if the user
scrolls down to 75% of the file, the parser skips forward to that
position and starts reading there, and when they scroll back up it goes
up and reads just that little bit of the file.

I am pretty sure feature 1 is possible with normal xerces sax, but I
have no idea how, the documentation is very sparse, naming the
functions etc but not actually saying what they do or how they should
be used.

For feature 2 it might be more complicated. A colleage mentioned some
other "object models" like xparse and xalaron (not sure how thats
pronounced or spelt) some apache project that parses xml in a random
access fashion.

Anyone got any ideas?

Thanks a lot.

Sep 4 '06 #1
3 1471
Kza wrote:
Feature 1: A progress display.
The SAX APIs can be persuaded to give line/column information, though
unless you know how many lines there were in the file before you stared
parsing it that doesn't do you any good. Look at the Locator API.

The DOM assumes reading the file is a single operation, so the concept
of getting incremental details doesn't make much sense. You *could* plug
in a stream filter between wherever the file is being read from and the
parser, and set up that filter so it counts characters going by --
that's going to give you only a very rough progress indication, and
again it requires that you know the length before you start if you want
to report it as a percentage-complete number.
Feature 2: Loading only a "screenful" of the file at a time.
"Screenful" is not defined in XML. Nor is starting parse from the middle
of a file. You could try to do something with incremental processing,
via throttling of ta SAX stream -- I've done that in the past -- but
keeping track of when enough has been read to fill a screen and when
more would have to be read to fill the next screen is very much an
application problem rather than a parser problem.

Random-access to an XML model isn't a problem -- the DOM can do that,
though again it isn't designed to operate on screenfuls -- but
random-order parsing really doesn't make sense. Namespaces are
context-dependent, to take one major point where that idea breaks down.
--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
Sep 5 '06 #2
"Kza" <kz****@gmail.comwrites:
Feature 1: A progress display. I have tried a few times now to find a
way of asking xerces how far through a file it is in bytes, but no
luck. (I did try a per element check, but that involves a whole extra
parse at the start just to count the elements). I have tried using the
LocalFileInputSource, and getting its BinInputStream and calling itc
curPos, but its always 0.

Any ideas?
You can implement your own InputStream which will keep track of how
much data Xerces-C++ has consumed so far. Combine this with the total
length of the file and you can calculate the progress.

Feature 2: Loading only a "screenful" of the file at a time. I also
would like some sort of random access functionality, so if the user
scrolls down to 75% of the file, the parser skips forward to that
position and starts reading there, and when they scroll back up it goes
up and reads just that little bit of the file.
This one would definitely be easier with an in-memory model (e.g., DOM).
hth,
-boris
--
Boris Kolpackov
Code Synthesis Tools CC
http://www.codesynthesis.com
Open-Source, Cross-Platform C++ XML Data Binding
Sep 8 '06 #3
Kza
Just as an update here, and I hope top posting is de riguer for this
news group,

I solved feature one with xerces getSrcOffset() method. Even though I
had to wrap it with an exception catcher, as the particular version we
are using at work at the moment causes an exception when parsing is
finished (but before the parse method returns) and theres no other way
to find out when its finished.

Feature 2 I dont have a solution for at the moment. DOM is not an
option as the whole point is that a whole file uses up too much memory,
and DOM loads the whole thing at once, thats why we wanted to load in a
section at a time.

If it turns out really important to analyse large files, I will just
have to write a seperate program that uses sax, and maybe only filters
for certain things, or perhaps reparses when people want to "scroll up"
which has its own time trade off for saving memory. Its up to the
customers really. I suspect the real solution is a non-xml indexed
binary format. But the memory issue isnt actually as big as the
customers think it is.. I will work something out.

Boris Kolpackov wrote:
"Kza" <kz****@gmail.comwrites:
Feature 1: A progress display. I have tried a few times now to find a
way of asking xerces how far through a file it is in bytes, but no
luck. (I did try a per element check, but that involves a whole extra
parse at the start just to count the elements). I have tried using the
LocalFileInputSource, and getting its BinInputStream and calling itc
curPos, but its always 0.

Any ideas?

You can implement your own InputStream which will keep track of how
much data Xerces-C++ has consumed so far. Combine this with the total
length of the file and you can calculate the progress.

Feature 2: Loading only a "screenful" of the file at a time. I also
would like some sort of random access functionality, so if the user
scrolls down to 75% of the file, the parser skips forward to that
position and starts reading there, and when they scroll back up it goes
up and reads just that little bit of the file.

This one would definitely be easier with an in-memory model (e.g., DOM).
hth,
-boris
--
Boris Kolpackov
Code Synthesis Tools CC
http://www.codesynthesis.com
Open-Source, Cross-Platform C++ XML Data Binding
Sep 8 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Bekkali Hicham | last post by:
hi, i have downloaded the latest version 2.4 of Xerces, and unziped it, i end up with a diectory hierarchy like this c:\xerces-2_4_0\XercesImpl.jar c:\xerces-2_4_0\XercesSamples.jar...
2
by: Olaf Meyer | last post by:
Apprentently xerces 2.6.0 (Java) does not validate against contraints specified in the schema (e.g. constraints specified via unique element). The validation works with the XML editor I'm using...
0
by: Jim Phelps | last post by:
After having memory leak issues with Xerces-c 2.3.0 for Solaris 2.7 for CC 6.2 I have decided to update to at least 2.4. I have downloaded the binary tarball and have installed it on my...
2
by: Cigdem | last post by:
Hello, I am trying to parse the XML files that the user selects(XML files are on anoher OS400 system called "wkdis3"). But i am permenantly getting that error: Directory0: \\wkdis3\ROOT\home...
1
by: Ralf Höppner | last post by:
Hallo, ich möchte gerne obigen XML Parser von der Apache-Seite (Version 2.4.0) in einem c++ Projekt einsetzen und suche dafür ein Tutorial. Kennt jemand vielleicht eins ? Mit den mitgelieferten...
0
by: Raymond.F | last post by:
Hi, I need some help with creating the XML declaration. I'm using Xerces-c++ 2.7.0 with MS VC++ 6. When creating an XML file through Xerces, by default it creates the XML declaration like this: ...
0
by: christian.eickhoff | last post by:
Hello Everyone, I am an unexperienced Linux developer currently tying to implement a binary coder for XML metadata in c++. I am using Linux (Ubuntu) OS as well as Eclipse (including CDT) as...
2
by: Nirmal | last post by:
Hi, Where can I get the real version of xerces 2.8.0 jar instead of xercesSamples2.8.0, This one I got it from Xerces-J-bin.2.8.0 downloads in ...
3
by: Dhirendra Singh | last post by:
I am new to xml parsing concept. can anyone suggest good books on Xerces C++ parsers. API documentation provided by apache is very raw and i do not find it very useful.
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.