Can xquery return a whole document sans a subsection?

ctchrinthry

I have some large and complex XML documents. I want to return the
whole document with some sections trimmed away.

Right now, i read the whole document into some python code, walk the
tree, snip the nodes that i don't want anymore, then dump the whole
tree out.

It seems that it would be better to use an XML database and XQUERY to
do this. However, the large XML document has a large and changing
structure. I have to do this over a number of different documents with
a common sunsection. I don't know how to say "give me this whole
document, except for this part, where i just want you to return the
nodes that match this query."

If your answer is just "buy this book:XXXX" that's fine. But i'd like
to know if it's possible and where to look.

Sep 8 '06 #1

Subscribe Reply

1715

Joseph Kesselman

Since you can do this in XSLT, and since XQuery is in large part
equivalent to XSLT 2.0 (different syntax but same underlying processing
model), I would expect XQuery can do it. I don't have a good example handy.

Sep 8 '06 #2

Peter Flynn

ct*********@gmail.com wrote:

I have some large and complex XML documents. I want to return the
whole document with some sections trimmed away.

Right now, i read the whole document into some python code, walk the
tree, snip the nodes that i don't want anymore, then dump the whole
tree out.

It seems that it would be better to use an XML database and XQUERY to
do this. However, the large XML document has a large and changing
structure.

If the document is that dynamic, a database won't be any help to you.
It sounds very much like a candidate for an XML server like Cocoon or
PropelX, using XSLT to perform the subsetting.

///Peter
--
XML FAQ: http://xml.silmaril.ie

Sep 9 '06 #3

ctchrinthry

Thank you to both of you! I never though of XSLT--however, i am a bit
of an XML neophyte.

I am not sure i can do what i want with XSLT. The document is pretty
complicated--you need a JOIN to do what i want in SQL, which is how the
data used to be stored.

<xml>
<time id="t1" timesepc="12:34:17">
<time id="t2" timesepc="12:34:19">
<row id="1">
<event start="t1" end="t2"/>
<event start="t3" end="t4"/>
</row>
<row id="2">
<event start="z1" end="z2"/>
<event start="z3" end="z4"/>
</row>

Is more or less the format. So, i want to say, "give me row one, where
the events are between timespec time1 and time2" and also "delete the
time tags that aren't needed anymore, while you're at it."

And, there's a lot of unstructured stuff surrounding these tags that i
can't just throw away or easily re-create.

It seems ( though I emphasize I am very new at this stuff ) that the
simplest thing to do is to read everyhthing into a parsed XML tree,
walk the tree, knock out the nodes i don't want anymore, and export the
tree back to an XML document. The good part is that this is only about
40 lines of code and it works.

This seems pretty ugly to me, however ,and i hate inelegant
solututions. My document set is not very dynamic--i have a repository
of maybe a hundred big XML documents, and only a few are added a week.
So an XML database where the documents are already parsed seems
logical.

I actually thought of ways to save the parse tree using MMAP and
zipping through that, but that is trying way too hard, IMHO.

dave

Sep 11 '06 #4

Peter Flynn

ct*********@gmail.com wrote:

Thank you to both of you! I never though of XSLT--however, i am a bit
of an XML neophyte.

I am not sure i can do what i want with XSLT. The document is pretty
complicated--you need a JOIN to do what i want in SQL, which is how the
data used to be stored.

Forget relational database theory here. XML ain't a database.

<xml>
<time id="t1" timesepc="12:34:17">
<time id="t2" timesepc="12:34:19">
<row id="1">
<event start="t1" end="t2"/>
<event start="t3" end="t4"/>
</row>
<row id="2">
<event start="z1" end="z2"/>
<event start="z3" end="z4"/>
</row>

That isn't well-formed XML.

Is more or less the format. So, i want to say, "give me row one, where
the events are between timespec time1 and time2" and also "delete the
time tags that aren't needed anymore, while you're at it."

I only see a t1 and a t2 defined. Do I assume there are dozens of
<timeelements, defining t* and z*?

And do you mean "events starting between" or "events wholly taking
place between"?

I'm also not clear that time1 and time2 are: if the first event in
the first row starts at t1 and ends at t2, what is your query input?
Is this time1 and time2?

It would be possible, although complex, to write XSLT to implement
this selection. It would be orders of magnitude easier if the data
was structured more usefully. Using XSLT and XQuery to compensate
for poorly-designed data models is possible, but inadvisable.

And, there's a lot of unstructured stuff surrounding these tags that i
can't just throw away or easily re-create.

That can usually be preserved fairly easily.

It seems ( though I emphasize I am very new at this stuff ) that the
simplest thing to do is to read everyhthing into a parsed XML tree,
walk the tree, knock out the nodes i don't want anymore, and export the
tree back to an XML document. The good part is that this is only about
40 lines of code and it works.

That's pretty much it, except that you do it by keeping the nodes
you want rather than removing those you don't want. And you almost
certainly don't want to do it by walking the tree: XQuery lets you
"cherry-pick" just those nodes which satisfy your conditions, and
ignore everything else.

This seems pretty ugly to me, however ,and i hate inelegant
solututions. My document set is not very dynamic--i have a repository
of maybe a hundred big XML documents, and only a few are added a week.
So an XML database where the documents are already parsed seems
logical.

I'm not clear what advantage putting the data in XML would bring you,
especially if the document structure is suboptimal for processing.

///Peter
--
XML FAQ: http://xml.silmaril.ie/

Sep 11 '06 #5

Similar topics

Xquery instead of xslt

by: Tom Corcoran | last post by:

I am working to ease updating of a html page by transforming 2 xml files. I was going to use xslt for this and had bought 2 unopened books, wrox xslt and o'reilly's xslt cookbook. But am now...

.NET Framework

XQuery APIs -- how do you identify the XML source

by: Benjamin G. Jones | last post by:

I am having a very basic problem with XQuery. I want to use an XQuery API in Java (either Saxon or Qexo), and I have an XQuery expression that works as expeced from the command line if I specify...

.NET Framework

XQuery version of xpath different than xslt's?

by: inquirydog | last post by:

Can anyone explain to me why the following XQuery expression (a simple xpath expression) returns a different result than the same expression in xslt? document("document.xml")//a/@b For the...

.NET Framework

xquery and node values, and like

by: Jeff Kish | last post by:

Hi. I see it appears that xquery is case senstitive for looking for particular attribute values etc. Is there a standard way around this? Say I want to see all nodes with an attribute valued...

.NET Framework

Binding data from XQuery to a dataset

by: HNguyen | last post by:

Hi, I have a problem of binding data from XQuery to a dataset. Here is the program I've extracted from 4guysfromrolla.com. This program read data from an XML file (Employees.xml) and performed...

ASP.NET

problem using xquery and asp.net

by: amessimon | last post by:

Hi I'm having a problem querying an XML document using Xquery, i dont know if this is the right place to post, but i couldnt find an xquery newsgroup. Ive also posted in the xml group. Ive not...

ASP.NET

XQuery how to keep order of elements?

by: paul.rusu | last post by:

I have a element "v" wich has different types of objects a,b,c. and i do: for $x in $doc/v/a where ... return $x for $x in $doc/v/b where ... return $x for $x in $doc/v/c where ... return $x...

.NET Framework

Part---Not Whole--of Stylesheet Not Rendered On One Server, But Fully Rendered On Another

by: Patient Guy | last post by:

I don't think I have ever encountered something like this before. On a Windows XP running Apache 2.0.53 I do my page development (XP machine). I then upload the stuff to a Linux/Redhat/Fedora...

HTML / CSS

Indicate document fragments with <LINK rel="Bookmark">

by: Stanimir Stamenkov | last post by:

I want to find out whether the following usage of the "Bookmark" link type is o.k. An example could be seen at <http://www.geocities.com/stanio/more/horoskop.html>. The text is in Bulgarian and...

HTML / CSS

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

C# / C Sharp

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

C# / C Sharp

php

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

Latest Bytes