473,394 Members | 1,889 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

Parsing multiple XML trees?

I have a server program that takes commands and acts on them. The
server program can also take these commands from an input file or
standard input (mainly for testing purposes). As such, I often have
files full of input commands to feed to the server.

Right now the commands that the server takes are well-defined, but not
in XML. Since the commands are not self-delimiting, I have to prepend
each command with a 'length' number indicating how many chars the
command takes.

I would like to change the server to accept XML commands, and provide
a DTD (or Schema or RelaxNG or ...) to ensure that the server only
receives valid commands.

My question is this: Can I take the length number out of my input
files & network commands? Since XML is self-delimiting (tags must
balance) this should be possible. However, every time I try to run a
Xerces (Java) parser on a file full of XML commands (with no length
info), it silently discards all but the first command.

I guess what I want to know is, can Xerces take an input stream full
of multiple XML trees and give me each XML tree in turn w/o discarding
any of them? (I can use either SAX or DOM or SAX2 to accomplish this.)

Several friends have suggested that I wrap the entire input file
around a <root> tag, which would make the series of commands into one
big giant happy XML file. I suppose that could work, but that has
several problems: (1) it requires a different DTD to handle multiple
commands than it does to handle one command. (2) as a server it
precludes me from using DOM since I need to act on each command before
the entire stream has been parsed.

Maybe this is the wrong forum to ask, but it's not clear what the
right forum would be. Is this feature covered in SAX? DOM? Is it
specific to Xerces?

~David Svoboda
Dec 15 '05 #1
3 1581


David Svoboda wrote:

However, every time I try to run a
Xerces (Java) parser on a file full of XML commands (with no length
info), it silently discards all but the first command. Several friends have suggested that I wrap the entire input file
around a <root> tag, which would make the series of commands into one
big giant happy XML file. I suppose that could work, but that has
several problems: (1) it requires a different DTD to handle multiple
commands than it does to handle one command. (2) as a server it
precludes me from using DOM since I need to act on each command before
the entire stream has been parsed.


One of the requirements of markup to be called XML is a single root
element thus if you want to process some markup with XML tools then you
need to have a single root element e.g.
<commands>
<command />
<command />
</commands>
if you have e.g.
<command />
<command />
then that is not XML as that is not well-formed markup.
--

Martin Honnen
http://JavaScript.FAQTs.com/
Dec 15 '05 #2
Martin Honnen wrote:


David Svoboda wrote:

However, every time I try to run a
Xerces (Java) parser on a file full of XML commands (with no length
info), it silently discards all but the first command.


Several friends have suggested that I wrap the entire input file
around a <root> tag, which would make the series of commands into one
big giant happy XML file. I suppose that could work, but that has
several problems: (1) it requires a different DTD to handle multiple
commands than it does to handle one command. (2) as a server it
precludes me from using DOM since I need to act on each command before
the entire stream has been parsed.

One of the requirements of markup to be called XML is a single root
element thus if you want to process some markup with XML tools then you
need to have a single root element e.g.
<commands>
<command />
<command />
</commands>
if you have e.g.
<command />
<command />
then that is not XML as that is not well-formed markup.


So does that mean if I'm running a server I can only send it one XML
command? That seems to mean that sending multiple XML commands is invalid.

What if a client sends two XML commands really quickly, and my server
'forgets' the second one? How does my server 'pop' exactly one XML
command off the socket?
~Dave
Dec 15 '05 #3
David Svoboda wrote:
Maybe this is the wrong forum to ask, but it's not clear what the
right forum would be. Is this feature covered in SAX? DOM? Is it
specific to Xerces?


I'm not sure this will be at all helpful, but we confronted this same
issue when designing an
XML parsing extension to gawk. If XMLMODE is positive, we allow only
a single XML document
to be parsed. But if XMLMODE is negative, we parse a stream of
concatenated documents
(issuing an "ENDDOCUMENT" event between documents).

We do this using the expat parser. The basic approach is to keep
parsing until an error
is encountered. When we get a parse error, we check to see whether the
current parse
depth is 0 and more than 0 elements have been parsed already. If so,
we infer that
we are done parsing a single XML document, so we issue the
"ENDDOCUMENT" event
and try to proceed with the next document. We do that by calling the
XML_GetCurrentByteIndex()
function to determine where in the input the error occurred. We use
that offset value to
identify where in the input to attempt to start parsing a new document.

If that's of any interest, you can take a look at the code here:
http://sourceforge.net/projects/xmlgawk
This could be directly useful (if you want to use xgawk's XML
extension), or the code
may serve as a guide for how to implement this in your environment.

Regards,
Andy

Dec 16 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Manlio Perillo | last post by:
Hi. With module parser it is possible to access Python parse trees. But this only works for 'external' source. I would like to known if, at least in theory, it can be possible to access Python...
6
by: JC | last post by:
Hi, I'm looking for some help on Binary trees, in particular levels, heights etc. I need to find the levels of a tree, I also need to determine the minimum, maximum and average leaf levels. ...
5
by: gamehack | last post by:
Hi all, I was thinking about parsing equations but I can't think of any generic approach. Basically I have a struct called math_term which is something like: struct math_term { char sign; int...
7
by: jefftyzzer | last post by:
Friends: In a DB2 UDB LUW table, I have a table with pairs of equivalent ID's. What I want to do is assign all equivalent IDs to the same group number, including those that are transitively...
1
by: CptDondo | last post by:
I've been struggling with this concept for a while, and I'm getting a bit burned out... I've got a piece of equipment that has data that I need to get. The data is stored in "bytes" for lack...
9
by: sunilmehta | last post by:
hi all i have started reading perl recently... my project is dealing with log files for which i will have to develop scripts to parse it and many more that performs many operationst. now my...
3
by: binary_sunset | last post by:
Okay... so this may be all to obvious to all except myself, but I am having some difficulty with XML output from Adobe InDesign. Each separate article in my publication is tagged as an article,...
6
by: rsprawls | last post by:
I found a disk for a b-tree algorithm that I purchased back in 93 or so. I'd hoped to find this, but now I'd like to know how worthwhile are b-trees in today's advancements? This is old C code...
2
by: bruce | last post by:
Hi... I'm using quick test with libxml2dom =============== import libxml2dom aa=libxml2dom.parseString(foo) ff=libxml2dom.toString(aa)
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.