On Jun 13, 2:25 pm, Jan Hidders <hidd...@gmail. comwrote:
On 13 jun, 21:10, Vadim Tropashko <vadimtro_inva. ..@yahoo.comwro te:
On Jun 13, 9:19 am, Jan Hidders <hidd...@gmail. comwrote:
Ok. I'm going to assume the following DTD (in a notation of my own
making to make it a bit more readable) for the syntax tree. (It uses
no attributes to keep things simple):
<stat_bl--<decl_kw<decl_i tem_list<begin_ kw<statements>
<semicol_kw<e nd-kw>
<decl_kw--EMPTY
<begin_kw--EMPTY
<end_kw--EMPTY
<semicol_kw--EMPTY
<statements--( <stat_bl| <assignment)+
<assignment--<var<assign_k w( <var| <number) <semicol_kw>
<var--PCDATA
<assign_kw--EMPTY
<number--PCDATA
<decl_item_li st--( <var<type<semic ol_kw)+
<type--PCDATA
All right, the DTD is a bastardized [context free?] grammar describing
a language that XML document is an element of. Although the adjectives
"cumbersome " and "ugly" still apply, I grudgingly admit that the idea
that a grammar fits into DTD effortlessly is quite powerful (so I'm
removing the "XML sucks" image from my homepage:-)
Of course, it combines ugliness with great power. Welcome to the
Dark Side. :-)
This raises some questions for me. Does the above mean we
should consider comparing XML and its associated ... whadayacallits?
XPath, XSL, etc.?, against more traditional parsing techniques?
*** What, if anything, would that leave out? ***
It seems clear enough, the idea of regarding an XML document
as a syntax tree, and the DTD as a grammar. In which case,
*XML*parsing is then the analog of ... what? Converting an
ascii version of a syntax tree into a binary syntax tree?
Sort of like parsing s-expressions?
Then XML querying and updating is comparable to operations
we would do once we had a syntax tree, which seems to
be multipass traversal and transformation. Think of an
interpreter, which traverses the tree, carrying out instructions
as it does so. The more complex case is a compiler
with multiple analysis, optimizer, and code generator passes.
These involve generating additional data structures which
may or may not be trees as well. A symbol table in a
lexically scoped language likely would be; an object file
likely would not be.
If that analysis is valid, then I'd be interested to continue to
a comparison with other techniques for accomplishing the
same thing, particularly from the standpoint of expressive
power, but also from the standpoint of convenience and
generality.
Other techniques I can think of are:
1) ad hoc traversal of an object graph in an OO language
2) pattern matching over abstract data types, as introduced
by SML. (The gold standard.)
3) "Tree walking" a la Terrance Parr / Antlr (the upstart.)
1) is what I am most familiar with. It isn't particularly pleasant,
but I suppose it's no more objectionable than anything else
in an OOPL. One ends up using the Visitor pattern a lot,
and one necessarily finds oneself having to defeat the
type system, since OOPLs are organized around extensible
structures w/ fixed operations, whereas trees work best
with fixed structures w/ extensible operations.
2) I would be very interested to hear a comparison with
pattern matching from someone who is familiar with both
techniques. One thing I can see immediately is that XPath
expressions allow one to address only some specific
part of a tree that one is interested in, whereas pattern
matching requires whole-tree analysis. How much of an
issue is this in practice?
3) is interesting. I admit that when I first heard about the
idea I was extremely skeptical. The idea that one would
write a grammar to abstract operating on a tree one had
generated by using a grammar to abstract over a language
seemed perverse and unnecessarily complex at first.
However in practice it's turned out to be the opposite:
it is extremely simple. Tree grammars are a fraction
of the size of the original grammar, possible similarly
to the size of an XPath expression to a DTD.
I suppose there are also term rewriting languages but I'm
not as familiar with them. I am also vaguely aware of
"strategies " which are abstractions of data structure
traversal techniques. I suppose further there are the
various tree encoding techniques of SQL, but I haven't found
these to be very exciting. Perhaps a more complete
relational language ...
Btw., if you forget about sequence order, document order and value
comparisons for a moment, the core of XPath is actually a relatively
neat calculus of binary relations:
- p_1/p_2 is the concatenation of binary relations
- p_1[p_2] is the selection of pairs in p_2 whose right-hand side
matches a left-hand side in p_2 (semijoin anyone?)
- p_1 union p_2 is the set union of two binary relations
- p_1 intersect p_2 is the set intersection of two binary relations
- p_1 except p_2 is the set difference between binary relations
Did I mention Tarski already? :-) Of course we had to wait until
XPath2.0 until the three set operations were all allowed everywhere,
but we have them now.
I guess we're talking about Relation Algebra here? I don't know
enough about it to know if this is just an interesting comparison
of if there's some specific result being alluded to.
My hope is that this post will stimulate an educated and informative
discussion of the relative merits of different approaches to
tree traversal, not neglecting to differentiate among the needs
of diverse use cases.
Hey, it could happen.
Marshall