Sign In | Register Now About Bytes | Help | Site Map
Connecting Tech Pros Worldwide

Joining XML files?

Question posted by: rhino (Guest) on July 1st, 2008 08:05 PM
I'm very new to XML and maybe just a touch impatient because I'm going to
ask a moderately advanced question even though I'm just learning the basics.

I've spent many years working with databases, both hierarchical and
relational. So far, XML is obviously hierarchical in nature. I'm wondering
if there is anything analogous to a relational "join" in XML?

For example, let's say I have an XML file that has a root element of
departments. Each record of the file has information about a single
department in a company and consists of a department number, department
name, manager name, and location. Let's say I have another XML file that
lists employees. Each record is an employee and gives information about the
employee's name, date of birth, department number, home address, etc.

Given that these are two separate XML files but that there is some common
information, specifically the department number, could I use XSLT to
generate a report that shows me each department name followed by the names
of the people who work in the department? Something like this:

Marketing
Department Number: 001
Location: New York
Manager: T. Jones

Other Staff
E. Humperdinck
E. Presley
J. Hendrix

Information Systems
Department Number: 666
Location: Toronto
Manager: M. Slate

Other Staff
F. Flintstone
B. Rubble
J. Rockhead

In other words, we're getting the department name and manager name from the
departments file and the "other staff" names from the employees file. We
know which employees go in which departments because the department number
is in both the departments file and in the employees file.

Is it conceptually possible to do this kind of joining in XSLT? If so, what
is this called? In other words, what are the main terms I need to know here?
I'd call this a join in relational database terminology but I imagine XSLT
has different terminology.

If this IS possible, can someone point me to a tutorial or reference that
explains how to write XSLT to do this?

--
Rhino


Joseph J. Kesselman's Avatar
Joseph J. Kesselman
Guest
n/a Posts
July 1st, 2008
08:25 PM
#2

Re: Joining XML files?
rhino wrote:
Quote:
Given that these are two separate XML files but that there is some common
information, specifically the department number, could I use XSLT to
generate a report that shows me each department name followed by the names
of the people who work in the department?


XSLT can certainly reference more than one input source, using the
document() function; then it's just a matter of writing expressions that
use data from one document to look up information in the other document.

There are probably examples on the XSLT FAQ website... but seriously,
once you know about the document() function it really isn't any harder
than if you were recombining data read from a single document.

The only tricky part, really, is deciding how you're going to tell the
stylesheet which two sources to look at. Common solutions are passing
one of the URIs in as a parameter, or having one hardcoded into the
stylesheet, or having a front-end document which the stylesheet obtains
both the actual URIs from.... Which of those solutions is best depends
on the environment you're performing this operation in. Note that all of
'em are extensible to more than 2 documents.

As to what to call it: Conceptually it's certainly a join or merge. The
former term is more likely to be recognized by DB and data-structure
folks, the latter is more familiar to folks coming to XML and XSLT from
the document-markup side of the world. I wouldn't get hung up on the
terminology; the clearest solution is probably to do exactly what you
did, provide a brief example of what you're trying to accomplish.

rhino's Avatar
rhino
Guest
n/a Posts
July 1st, 2008
10:15 PM
#3

Re: Joining XML files?

"Joseph J. Kesselman" <keshlam-nospam@comcast.netwrote in message
news:486a9058$1@kcnews01...
Quote:
rhino wrote:
Quote:
>Given that these are two separate XML files but that there is some common
>information, specifically the department number, could I use XSLT to
>generate a report that shows me each department name followed by the
>names of the people who work in the department?

>
XSLT can certainly reference more than one input source, using the
document() function; then it's just a matter of writing expressions that
use data from one document to look up information in the other document.
>
There are probably examples on the XSLT FAQ website... but seriously, once
you know about the document() function it really isn't any harder than if
you were recombining data read from a single document.
>
The only tricky part, really, is deciding how you're going to tell the
stylesheet which two sources to look at. Common solutions are passing one
of the URIs in as a parameter, or having one hardcoded into the
stylesheet, or having a front-end document which the stylesheet obtains
both the actual URIs from.... Which of those solutions is best depends on
the environment you're performing this operation in. Note that all of 'em
are extensible to more than 2 documents.
>
As to what to call it: Conceptually it's certainly a join or merge. The
former term is more likely to be recognized by DB and data-structure
folks, the latter is more familiar to folks coming to XML and XSLT from
the document-markup side of the world. I wouldn't get hung up on the
terminology; the clearest solution is probably to do exactly what you did,
provide a brief example of what you're trying to accomplish.


Thank you once again, Joseph! This definitely gets me going in the right
direction.

I was counting on something like this being possible for the project I am
designing. The fact that it is possible, and apparently pretty routine, is
VERY helpful in planning what I need to do next. (After I work out a couple
of examples of joins/merges, that is!)

I really appreciate your assistance with my questions today!

--
Rhino



Hermann Peifer's Avatar
Hermann Peifer
Guest
n/a Posts
July 6th, 2008
04:55 PM
#4

Re: Joining XML files?
rhino wrote:
Quote:
>
Given that these are two separate XML files but that there is some common
information, specifically the department number, could I use XSLT to
generate a report that shows me each department name followed by the names
of the people who work in the department? Something like this:
>
Marketing
Department Number: 001
Location: New York
Manager: T. Jones
>
Other Staff
E. Humperdinck
E. Presley
J. Hendrix
>
Information Systems
Department Number: 666
Location: Toronto
Manager: M. Slate
>
Other Staff
F. Flintstone
B. Rubble
J. Rockhead
>


Just in case you (or someone else) might be interested in a non-XSLT solution: here a small xgawk script that does the job.

$ cat departments.xml
<?xml version="1.0" encoding="UTF-8"?>
<departments>
<department>
<department_number>001</department_number>
<name>Marketing</name>
<location>New York</location>
<manager>T. Jones</manager>
</department>
<department>
<department_number>666</department_number>
<name>Information Systems</name>
<location>Toronto</location>
<manager>M. Slate</manager>
</department>
</departments>

$ cat employees.xml
<?xml version="1.0" encoding="UTF-8"?>
<employees>
<employee>
<name>E. Humperdinck</name>
<department_number>001</department_number>
</employee>
<employee>
<name>E. Presley</name>
<department_number>001</department_number>
</employee>
<employee>
<name>J. Hendrix</name>
<department_number>001</department_number>
</employee>
<employee>
<name>F. Flintstone</name>
<department_number>666</department_number>
</employee>
<employee>
<name>B. Rubble</name>
<department_number>666</department_number>
</employee>
<employee>
<name>J. Rockhead</name>
<department_number>666</department_number>
</employee>
</employees>

$ cat join.awk
@load xml

XMLSTARTELEM {data = "" ; next}
XMLCHARDATA {data = $0 ; next}

XMLDEPTH == 3 && XMLENDELEM {
a[XMLENDELEM] = data
dept = a["department_number"]
}

NR == FNR && XMLENDELEM == "employee" {
o[dept] = o[dept] sep[dept] " " a["name"]
sep[dept] = "\n"
next
}

XMLENDELEM == "department" {
print a["name"]
print " Department Number: " dept
print " Location: " a["location"]
print " Manager: " a["manager"]

print ORS " Other Staff" ORS o[dept] ORS
}

END {if (XMLERROR) print XMLERROR}

$ xgawk -f join.awk employees.xml departments.xml
Marketing
Department Number: 001
Location: New York
Manager: T. Jones

Other Staff
E. Humperdinck
E. Presley
J. Hendrix

Information Systems
Department Number: 666
Location: Toronto
Manager: M. Slate

Other Staff
F. Flintstone
B. Rubble
J. Rockhead

Peter Flynn's Avatar
Peter Flynn
Guest
n/a Posts
July 13th, 2008
08:25 PM
#5

Re: Joining XML files?
rhino wrote:
Quote:
I'm very new to XML and maybe just a touch impatient because I'm going to
ask a moderately advanced question even though I'm just learning the basics.
>
I've spent many years working with databases, both hierarchical and
relational. So far, XML is obviously hierarchical in nature. I'm wondering
if there is anything analogous to a relational "join" in XML?


Be aware of the warning in http://xml.silmaril.ie/authors/databases/
XML is a language specification, not a database application. While there
are some similarities, there are a *lot* of differences.

///Peter

 
Not the answer you were looking for? Post your question . . .
189,939 Experts ready to help you find a solution.
Sign up for a free account, or Login (if you're already a member).

Latest Articles: Read & Comment
  • Didn't find the answer you were looking for?
    Post Your Question
  • Top Community Contributors