473,789 Members | 2,926 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Hierarchical query

Reposting with more clarification (as Jan asked).

Suppose I have a BNFgrammar and a source text parsed into a tree. How
would I query an
identifier declaration?

All the XQuery tutorials (where I went to gather some ideas) start
with simpleminded examples like browsing all the descendants of /
bookstore/book (and the bookstore XML design is such wrong idea to
boot).

Perhaps some example is needed. A simplified grammar:

statement_block :
'declare'
declaration_ite m_list
'begin'
statements
'end'
;

statements:
(statement_bloc k | assignment) statements |
;

assignment:
identifier ':=' (identifier | number) ';'
;

declaration_ite m_list:
identifier 'integer' ';'
;

Suppose we parse the following text

declare -- token #1
i integer; -- tokens 2,3,4
begin
i := 1; -- tokens 6,7,8,9
end;

So that we get the derivation tree like this:

1 statement_block
1.1 'declare' (matches token #1)
1.2 declaration_ite m_list
1.2.1 identifier (matches token #2)
1.2.2 'integer' (matches token #3)
1.2.3 ';' (matches token #4)
1.3 'begin' (matches token #5)
1.4 statements
1.4.1 assignment
1.4.1.1 identifier
1.4.1.2 ':='
1.4.1.3 number
1.4.1.4 ';'
1.4.2 statements (matches empty token)
1.5 'end';
Now, given a derivation tree and the leaf node identifier 1.4.1.1
corresponding to the token #6 which is variable i, how do we find the
node 1.2.1 that declares it?

Jun 13 '07
30 2370
On Jun 13, 2:25 pm, Jan Hidders <hidd...@gmail. comwrote:
Btw., if you forget about sequence order, document order and value
comparisons for a moment, the core of XPath is actually a relatively
neat calculus of binary relations:
- p_1/p_2 is the concatenation of binary relations
- p_1[p_2] is the selection of pairs in p_2 whose right-hand side
matches a left-hand side in p_2 (semijoin anyone?)
- p_1 union p_2 is the set union of two binary relations
- p_1 intersect p_2 is the set intersection of two binary relations
- p_1 except p_2 is the set difference between binary relations

Did I mention Tarski already? :-) Of course we had to wait until
XPath2.0 until the three set operations were all allowed everywhere,
but we have them now.
Yes, and you have been credited for that (by me removing that XML
banner:-).

Well, the question originated from the world of Kleene algebras (which
Tarski relation algebra is an instance of), idempotent semirings, all
the other language theory goodies. I was temorarily confused by the
concept of the "query". Now that it it no longer the case, and I can
operate back into the language theory space, what do I need XML for?

Jun 13 '07 #11
On 13 jun, 23:56, Vadim Tropashko <vadimtro_inva. ..@yahoo.comwro te:
>
[...] Then, your reply was a critical for me
understanding that the tree structure is unnecessary, the derivation
is essentially a language -- a set of words (which includes both
terminals and nonterminals) and this set of words can be quieried
solely with the language theory means. Formally, a query is a language
intersection.
Of course, all computation, including RDBMS and XML querying and
transformation, is ultimately just string manipulation. Doesn't mean
that Turing Machines are always the most appropriate formalism for
describing them, does it? :-)

-- Jan Hidders

Jun 13 '07 #12
On Jun 13, 3:47 pm, Jan Hidders <hidd...@gmail. comwrote:
On 13 jun, 23:56, Vadim Tropashko <vadimtro_inva. ..@yahoo.comwro te:
[...] Then, your reply was a critical for me
understanding that the tree structure is unnecessary, the derivation
is essentially a language -- a set of words (which includes both
terminals and nonterminals) and this set of words can be quieried
solely with the language theory means. Formally, a query is a language
intersection.

Of course, all computation, including RDBMS and XML querying and
transformation, is ultimately just string manipulation. Doesn't mean
that Turing Machines are always the most appropriate formalism for
describing them, does it? :-)
You seem to imply that language theory reduces to trivial string
manipulation. I expected you to request clarification why a query is a
language intersection; is there any nonvacuous idea behind this
statement? And insist on demonstating that the reg exp method does
work:-)

Consider a simple expression grammar

expr :
expr '+' expr
| '(' expr ')'
| '1'

and the string

(1 + 1) + 1

When we ask if this string belongs to the language defined by the
grammar, we are interested to know if the intersection of the infinite
language defined by the above expression grammar and finite language
coonsisting of a single string is empty.

The derivation that proves that a string belongs to a language is a
chain of inequalities (where inequality is understood to be the usual
language subset relation):

(1 + 1) + 1 < (expr + 1) + 1 < ... < (expr + expr) + expr < expr +
expr < expr

Now we have derivation tree:

1 expr
1.1 expr
1.1.1 (
1.1.2 expr
1.1.2.1 expr
1.1.2.1.1 1
1.1.2.2 +
1.1.2.3 expr
1.1.2.3.1 1
1.1.3 )
1.2 +
1.2 expr
1.2.1 1

Next if we encode tree nodes as paths of the grammar symbols, then not
only the order is lost (as you correctly observed), but the duplicate
paths are collapsed:

1 --expr
1.1 --expr expr
1.1.1 --expr expr (
1.1.2 --expr expr expr
1.1.2.1 --expr expr expr expr
1.1.2.1.1 --expr expr expr expr expr
1.1.2.2
1.1.2.3
1.1.2.3.1
1.1.3
1.2
1.2 --expr expr (same as 1.1!)
1.2.1



Jun 13 '07 #13
On Jun 13, 2:25 pm, Jan Hidders <hidd...@gmail. comwrote:
On 13 jun, 21:10, Vadim Tropashko <vadimtro_inva. ..@yahoo.comwro te:
On Jun 13, 9:19 am, Jan Hidders <hidd...@gmail. comwrote:
Ok. I'm going to assume the following DTD (in a notation of my own
making to make it a bit more readable) for the syntax tree. (It uses
no attributes to keep things simple):
<stat_bl--<decl_kw<decl_i tem_list<begin_ kw<statements>
<semicol_kw<e nd-kw>
<decl_kw--EMPTY
<begin_kw--EMPTY
<end_kw--EMPTY
<semicol_kw--EMPTY
<statements--( <stat_bl| <assignment)+
<assignment--<var<assign_k w( <var| <number) <semicol_kw>
<var--PCDATA
<assign_kw--EMPTY
<number--PCDATA
<decl_item_li st--( <var<type<semic ol_kw)+
<type--PCDATA
All right, the DTD is a bastardized [context free?] grammar describing
a language that XML document is an element of. Although the adjectives
"cumbersome " and "ugly" still apply, I grudgingly admit that the idea
that a grammar fits into DTD effortlessly is quite powerful (so I'm
removing the "XML sucks" image from my homepage:-)

Of course, it combines ugliness with great power. Welcome to the
Dark Side. :-)
This raises some questions for me. Does the above mean we
should consider comparing XML and its associated ... whadayacallits?
XPath, XSL, etc.?, against more traditional parsing techniques?
*** What, if anything, would that leave out? ***

It seems clear enough, the idea of regarding an XML document
as a syntax tree, and the DTD as a grammar. In which case,
*XML*parsing is then the analog of ... what? Converting an
ascii version of a syntax tree into a binary syntax tree?
Sort of like parsing s-expressions?

Then XML querying and updating is comparable to operations
we would do once we had a syntax tree, which seems to
be multipass traversal and transformation. Think of an
interpreter, which traverses the tree, carrying out instructions
as it does so. The more complex case is a compiler
with multiple analysis, optimizer, and code generator passes.
These involve generating additional data structures which
may or may not be trees as well. A symbol table in a
lexically scoped language likely would be; an object file
likely would not be.

If that analysis is valid, then I'd be interested to continue to
a comparison with other techniques for accomplishing the
same thing, particularly from the standpoint of expressive
power, but also from the standpoint of convenience and
generality.

Other techniques I can think of are:

1) ad hoc traversal of an object graph in an OO language
2) pattern matching over abstract data types, as introduced
by SML. (The gold standard.)
3) "Tree walking" a la Terrance Parr / Antlr (the upstart.)

1) is what I am most familiar with. It isn't particularly pleasant,
but I suppose it's no more objectionable than anything else
in an OOPL. One ends up using the Visitor pattern a lot,
and one necessarily finds oneself having to defeat the
type system, since OOPLs are organized around extensible
structures w/ fixed operations, whereas trees work best
with fixed structures w/ extensible operations.

2) I would be very interested to hear a comparison with
pattern matching from someone who is familiar with both
techniques. One thing I can see immediately is that XPath
expressions allow one to address only some specific
part of a tree that one is interested in, whereas pattern
matching requires whole-tree analysis. How much of an
issue is this in practice?

3) is interesting. I admit that when I first heard about the
idea I was extremely skeptical. The idea that one would
write a grammar to abstract operating on a tree one had
generated by using a grammar to abstract over a language
seemed perverse and unnecessarily complex at first.
However in practice it's turned out to be the opposite:
it is extremely simple. Tree grammars are a fraction
of the size of the original grammar, possible similarly
to the size of an XPath expression to a DTD.

I suppose there are also term rewriting languages but I'm
not as familiar with them. I am also vaguely aware of
"strategies " which are abstractions of data structure
traversal techniques. I suppose further there are the
various tree encoding techniques of SQL, but I haven't found
these to be very exciting. Perhaps a more complete
relational language ...

Btw., if you forget about sequence order, document order and value
comparisons for a moment, the core of XPath is actually a relatively
neat calculus of binary relations:
- p_1/p_2 is the concatenation of binary relations
- p_1[p_2] is the selection of pairs in p_2 whose right-hand side
matches a left-hand side in p_2 (semijoin anyone?)
- p_1 union p_2 is the set union of two binary relations
- p_1 intersect p_2 is the set intersection of two binary relations
- p_1 except p_2 is the set difference between binary relations

Did I mention Tarski already? :-) Of course we had to wait until
XPath2.0 until the three set operations were all allowed everywhere,
but we have them now.
I guess we're talking about Relation Algebra here? I don't know
enough about it to know if this is just an interesting comparison
of if there's some specific result being alluded to.

My hope is that this post will stimulate an educated and informative
discussion of the relative merits of different approaches to
tree traversal, not neglecting to differentiate among the needs
of diverse use cases.

Hey, it could happen.
Marshall

Jun 13 '07 #14
On Jun 13, 4:47 pm, Marshall <marshall.spi.. .@gmail.comwrot e:
3) is interesting. I admit that when I first heard about the
idea I was extremely skeptical. The idea that one would
write a grammar to abstract operating on a tree one had
generated by using a grammar to abstract over a language
seemed perverse and unnecessarily complex at first.
However in practice it's turned out to be the opposite:
it is extremely simple. Tree grammars are a fraction
of the size of the original grammar, possible similarly
to the size of an XPath expression to a DTD.
Do you have a reference?
http://antlr.org/article/1170602723163/treewalkers.html
insists that the "tree grammar" concept is not appeared to be coherent
to begin with...

Jun 14 '07 #15
On Jun 13, 5:35 pm, Vadim Tropashko <vadimtro_inva. ..@yahoo.com>
wrote:
On Jun 13, 4:47 pm, Marshall <marshall.spi.. .@gmail.comwrot e:
3) is interesting. I admit that when I first heard about the
idea I was extremely skeptical. The idea that one would
write a grammar to abstract operating on a tree one had
generated by using a grammar to abstract over a language
seemed perverse and unnecessarily complex at first.
However in practice it's turned out to be the opposite:
it is extremely simple. Tree grammars are a fraction
of the size of the original grammar, possible similarly
to the size of an XPath expression to a DTD.

Do you have a reference?http://antlr.org/article/1170602723163/treewalkers.html
insists that the "tree grammar" concept is not appeared to be coherent
to begin with...
Well, that paper's on Terrence's site, so probably he doesn't
see it as *too* devastating. :-)

But really, I had all those arguments in my head, until I tried it.
It turned out really easy to do.

In something I did recently, a simple expression grammar,
all the precedence, etc, went in the "regular" grammar.
So the AST captured all of that. Then I needed to translate
the AST into custom Java objects. It was ridiculously easy;
the tree grammar was just:

prog: expr*;

expr: <node1 a, b{new Node1(a, b);}
| <node2{new Node2();}
| ...
;

Perhaps my next step is getting rid of the Java AST
objects and just using the Antlr AST object.

Anyway, I've been very impressed with ANTLR.

As an aside, another idea of Terrence's was an IDE for
Antrlr, which I also thought was ridiculous when I first heard
about it. But now v3.0 is out and I've used the IDE and
it's as much a win for parsers as it is for source code.
Being able to interactively tweak the grammar and
parse programs with each revision, having a CST and AST
view, being able to single step through the parse, etc.
Marshall

Jun 14 '07 #16
Jan Hidders wrote:
>hmmm. actually I'm not sure what $dvar/string() was intended to mean,
and I don't have time right now to check that.

It retrieves the text node children of the nodes in $dvar.
No, that would be $dvar/text().

--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
Jun 14 '07 #17
On 14 jun, 07:49, Joe Kesselman <keshlam-nos...@comcast. netwrote:
Jan Hidders wrote:
hmmm. actually I'm not sure what $dvar/string() was intended to mean,
and I don't have time right now to check that.
It retrieves the text node children of the nodes in $dvar.

No, that would be $dvar/text().
Oops, you are right of course. The function string() concatenates all
the string values of the descendant text nodes. So in this case it
simply retrieves the string value of the text node under the element
node that is in $dvar.

-- Jan Hidders

Jun 14 '07 #18
On 14 jun, 01:47, Marshall <marshall.spi.. .@gmail.comwrot e:
>
It seems clear enough, the idea of regarding an XML document
as a syntax tree, and the DTD as a grammar. In which case,
*XML*parsing is then the analog of ... what? Converting an
ascii version of a syntax tree into a binary syntax tree?
Sort of like parsing s-expressions?
Sure, ordered node-labeled trees, S-expressions, XML documents,
recursively nested terms, et cetera, are all more or less the same
thing. Of course the typical use cases are different, which justifies
having diferent transformation / update / query / selection languages
and formalisms.
If that analysis is valid, then I'd be interested to continue to
a comparison with other techniques for accomplishing the
same thing, particularly from the standpoint of expressive
power, but also from the standpoint of convenience and
generality.

Other techniques I can think of are:

1) ad hoc traversal of an object graph in an OO language
2) pattern matching over abstract data types, as introduced
by SML. (The gold standard.)
3) "Tree walking" a la Terrance Parr / Antlr (the upstart.)
Yep, Tree automata, Tree transducers, Attribute grammars, XQuery,
XSLT... all have their typical use cases. Is that what you are
interested in?
2) I would be very interested to hear a comparison with
pattern matching from someone who is familiar with both
techniques. One thing I can see immediately is that XPath
expressions allow one to address only some specific
part of a tree that one is interested in, whereas pattern
matching requires whole-tree analysis. How much of an
issue is this in practice?
I think that question needs more context in order to be meaningful.
The pattern matching mechanism is usually embedded in another
languages (some functional programming language like Caml, XSLT or
XQuery). Depending upon that language it may or may not be a problem
if your pattern matching mechanism lacks certain expressive power.
Btw., if you forget about sequence order, document order and value
comparisons for a moment, the core of XPath is actually a relatively
neat calculus of binary relations:
- p_1/p_2 is the concatenation of binary relations
- p_1[p_2] is the selection of pairs in p_2 whose right-hand side
matches a left-hand side in p_2 (semijoin anyone?)
- p_1 union p_2 is the set union of two binary relations
- p_1 intersect p_2 is the set intersection of two binary relations
- p_1 except p_2 is the set difference between binary relations
Did I mention Tarski already? :-) Of course we had to wait until
XPath2.0 until the three set operations were all allowed everywhere,
but we have them now.

I guess we're talking about Relation Algebra here? I don't know
enough about it to know if this is just an interesting comparison
of if there's some specific result being alluded to.
Yep.

<http://staff.science.u va.nl/~marx/talks/2005/screen_icdt.pdf >

Page 12.

-- Jan Hidders

Jun 14 '07 #19
On 14 jun, 01:39, Vadim Tropashko <vadimtro_inva. ..@yahoo.comwro te:
On Jun 13, 3:47 pm, Jan Hidders <hidd...@gmail. comwrote:
Of course, all computation, including RDBMS and XML querying and
transformation, is ultimately just string manipulation. Doesn't mean
that Turing Machines are always the most appropriate formalism for
describing them, does it? :-)

You seem to imply that language theory reduces to trivial string
manipulation.
Trivial? You think that the string manipulations expressed by Turing
Machines are trivial?
I expected you to request clarification why a query is a
language intersection; is there any nonvacuous idea behind this
statement? And insist on demonstating that the reg exp method does
work:-)
I though insisting that it cannot work would accomplish the same. But
your explanation still does not seem complete. Also you are still a
bit vague on what you are claiming. Are you saying that anything for
which one would typically use XPath can be done just as well with
regular expressions assuming that the tree is somehow encoded in a
string? This is clearly not the case for the encoding you gave. Or are
you going to extend that? Perhaps also extend the regular expressions
a little? Or did you just have a particular query in mind? That would
probably not be a very interesting observation. So which is it?

-- Jan Hidders

Jun 14 '07 #20

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
4121
by: Mike N. | last post by:
Hello to all: First let me apologize for the length of this question, I've made an attempt to include as much information as is needed to help with the question. I am having problems putting together a query to pull out an alternative hierarchical view of my data. The database is implemented under SQL Sever 2000 and I am writing the front end using VB.Net and ADO.net. The following is the portion of my database structure that I am...
3
5590
by: Cláudia Morgado | last post by:
Hello! Oracle has the option with the SQL CONECT BY statement to run through a hierarchical database with a single SQl-statement: <!--SQL SELECT ms_id,ms_parent FROM messages CONNECT BY PRIOR ms_id = ms_parent START WITH ms_id = 1 --> Result-set (example): ms_id parent_id 1 1.1 1 1.1.1 1.1 1.1.2 1.1 1.1.3 1.1 1.2 1 1.2.1 1.2
13
5908
by: Anton.Nikiforov | last post by:
Hello everybody! Does someone know how to build hierarchical queries to the postgresql? I have a table with tree in it (id, parent) and need to find a way from any point of the tree to any other point. And i would like to have a list of all steps from point A to point B to make some changes on each step (this is required by the algorythm). Here is an example:
1
2620
by: News Server | last post by:
Any help appreciated. I have two Access tables, customer and orders. I would like to create a truly hierarchical xml file form the joined tables. I need to produce: <Query1> <Customer> <Name>Bob</Name> <Order> <ID>1001</ID>
4
2610
by: Daisy | last post by:
Let's say I've got a forum, where users can be moderators of each forum. Tables look like this: USER -------- user_key name FORUM
5
2257
by: clintonG | last post by:
I'm looking for documentation and would not turn my nose up to any code from anybody who thinks they are good at the design of an algorythm that can be used to generated a hierarchical relational data model. What? A Yahoo-like drill-down menu that is a series of categories and nested categories is a hierarchical relational data model. An example can be seen at but the review of the query string values strongly indicates
3
2044
by: Bennett Haselton | last post by:
I want to display a hierarchical listing of items from a database table, where, say, each row in the table has an "ID" field and a "parent_id" field giving the ID of its parent (NULL if it's at the top level of the hierarchy) -- like message posts and their replies. Is there a built-in way to do this, or a generally accepted simplest way? My first idea was to create a user control like HierarchicalListing that contains a Repeater, and...
1
1868
by: rose74 | last post by:
hi i'd like to know how can i merge two hierarchical trees together. for instnace to query for each deparatment all its employees with considering to the employees hierarchicy (managers-employees).
12
5824
by: Steve | last post by:
I have been studying the Adjacency List Model as a means of achieving a folder structure in a project I am working on. Started with the excellent article by Gijs Van Tulder http://www.sitepoint.com/article/hierarchical-data-database My database has this basic structure: Id FolderName
0
10200
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10139
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9984
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9020
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7529
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6769
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5551
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
3701
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2909
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.