This is more of a general SQL relationship question than a
PostgreSQL-specific question, although I am using PostgreSQL to
implement this.
I have a number of data structures which I am calling "units". A unit
has a name, number and a data structure. I store a unit as a single
record in a table with fields for the name and number and a field for an
XML string which represents the data on the unit.
- table "units":
numb name XML
1 unit1 <data ... />
2 unit2 <data refers_to="unit1" ... />
3 unit3 <data refers_to="unit2" ... />
There are relationships between these units which I would like the
database to understand. If one unit refers to other units, I would like
the database to return all referenced units in a single SELECT call.
therefore:
- loading 1 would return unit1
- loading 2 would return unit2 and unit1
- loading 3 would return unit3, unit2 and unit1
I currently handle this outside of the database with multiple SELECT
calls - I read a single unit from the database, examine the XML data for
references to other units, load those units, rinse and repeat until each
referenced unit is loaded. I achieve the objective of loading all units
that are referenced by originating unit, but I have to make multiple
SELECT calls to get there.
I don't expect the database to understand the XML data structure that
describes the unit relationships, of course. Instead, I would like to
put the right table structure in place to begin with and use SQL syntax
to make a single SELECT call from one table that returns one or more
units from another table.
My best guess right now is a "relationships" table:
rel_id unit_numb rel_numb
1 1 1
2 2 2
3 2 1
4 3 3
5 3 2
6 3 1
to load unit3, I would "SELECT rel_numb FROM relationships WHERE
unit_numb = 3" and then load all records from the "units" table whose
number matches the returned rel_numb values.
This seems cacky and smells of a non-SQL approach. I don't know whether
I would be able to SELECT it all with a single call or whether I would
have to separate it into 2 calls. There is also a certain amount of
duplication - rel_id 6 should be unnecessary because rel_id 5 should
imply rel_id 3 (in other words, unit3's relationship to unit1 exists
because of the unit3->unit2 and unit2->unit1 relationship).
Am I heading in the right direction? I am an experienced Perl programmer
but my experience with SQL is limited. Can someone tell me how much of
this SQL can do for me, given the right table setup?
Thanks,
/dave