SQL for Modeling Generalization Hierarchies

Robert Brown

Is there a good approach to modelling many heterogeneous entity types
with that have some attributes in common?

Say I have entities "employees" which share some attibutes (e.g.
firstname, lastname, dateofbirth) but some subsets of employees (e.g.
physicians, janitors, nurses, ambulance drivers) may have additional
attributes that do not apply to all employees. Physicians may have
attributes specialty and date of board certification, ambulance
drivers may have a drivers license id, janitors may have
preferredbroomtype and so on.

There are many employee subtypes and more can be dynamically added
after the application is deployed so it's obviously no good to keep
adding attributes to the employees table because most attributes will
be NULL (since janitors are never doctors at the same time).

The only solution I found for this is a generalization hiearchy where
you have the employee table with all generic attributes and then you
add tables for each new employee subtype as necessary. The subtype
tables share the primary key of the employee table. The employee table
has a "discriminator" field that allows you to figure out which
subtype table to load for a particular entity.

This solution does not seem to scale since for each value of
"discriminator" I need to perform a join with a different table. What
if I need to retrieve 1,000 employees at once?

Is that possible to obtain a single ResultSet with one SQL statement
SQL?

Or do you I need to iterate look at the discriminator and then
perform the appropriate join? If this kind of iteration is necessary
then obviously this generalization hierarchy approach does not work in
practice
since it would be painfully slow.

Is there a better approach to modelling these kind of heterogeneous
entities with shared attributes that does not involve creating a table
for each new employee type or having sparce tables (mostly filled with
NULLS)

I guess another approach would be to use name/value pairs but that
would make reporting really ugly.

Seems like a very common problem. Any ideas? Is this a fundamental
limitation of SQL?

Thanks!

- robert

Jul 19 '05 #1

Subscribe Post Reply

8826

Laconic2

"Robert Brown" <ro*************@yahoo.com> wrote in message
news:24**************************@posting.google.c om...

Is there a good approach to modelling many heterogeneous entity types
with that have some attributes in common?
This is a frequently asked question. Unfortunately, the frequently given
responses don't usually settle the matter.

There are many employee subtypes and more can be dynamically added
after the application is deployed so it's obviously no good to keep
adding attributes to the employees table because most attributes will
be NULL (since janitors are never doctors at the same time).

The only solution I found for this is a generalization hiearchy where
you have the employee table with all generic attributes and then you
add tables for each new employee subtype as necessary. The subtype
tables share the primary key of the employee table. The employee table
has a "discriminator" field that allows you to figure out which
subtype table to load for a particular entity.
If you invent new subtypes on an adhoc basis, (e.g. a DBA is never either a
janitor or a doctor)
and you invent new tables for subtypes on an equally ad hoc basis, I'd
suggest that you are altering your data model on an ad hoc basis.

You can do that if you want, but don't expect the same power and simplicity
that you get from a stable data model.

This solution does not seem to scale since for each value of
"discriminator" I need to perform a join with a different table. What
if I need to retrieve 1,000 employees at once?
Why would you ever need the attributes of more than one subtype? If your
query is about doctors, why would you need to join the table about janitors
into the query?

If your query is about employees, why would you need to know any of the
subtype attributes?

Or do you I need to iterate look at the discriminator and then
perform the appropriate join? If this kind of iteration is necessary
then obviously this generalization hierarchy approach does not work in
practice
since it would be painfully slow.
You need to ask two questions about the model, and about any imlpementation
based on the model:
first, is it correct and complete? Second, if yes to the first, is it fast
enough?

In general, different implementations of the same model should be
transparent to users of the imlpementation who only look at the features
visible in the model.

Is there a better approach to modelling these kind of heterogeneous
entities with shared attributes that does not involve creating a table
for each new employee type or having sparce tables (mostly filled with
NULLS)
If you have new entities, you are going to have new relations. That either
means having new tables, or fudging the relationship between tables and
rleations. Do the second one at your own peril.
Seems like a very common problem. Any ideas? Is this a fundamental
limitation of SQL?

I would suggest it goes beyond SQl to the very heart of using foreign key/
primary key aossciations to establish linkages.

Jul 19 '05 #2

Laconic2

"Robert Brown" <ro*************@yahoo.com> wrote in message
news:24**************************@posting.google.c om...

adding attributes to the employees table because most attributes will
be NULL (since janitors are never doctors at the same time).

By the way, after I wrote "a DBA is never a janitor or a doctor" it
occurred to me that when I was a DBA there were days when I felt like I was
both at the same time! ;)

Jul 19 '05 #3

Erland Sommarskog

Robert Brown (ro*************@yahoo.com) writes:

The only solution I found for this is a generalization hiearchy where
you have the employee table with all generic attributes and then you
add tables for each new employee subtype as necessary. The subtype
tables share the primary key of the employee table. The employee table
has a "discriminator" field that allows you to figure out which
subtype table to load for a particular entity.

This solution does not seem to scale since for each value of
"discriminator" I need to perform a join with a different table. What
if I need to retrieve 1,000 employees at once?
So what? 1000 rows for a modern RDBMS is a breeze.
Is that possible to obtain a single ResultSet with one SQL statement
SQL?
Yes, although with many discriminators, it will be one hell of a join:

SELECT main.col1, main.col2, ... d1.col1, d1.col2, ...
FROM main
LEFT JOIN discriminator d1 ON d1.keycol = main.keycol
LEFT JOIN discriminator d2 ON d2.keycol = main.keycol
...

In practice, things might be even messier, because some values might
apply to four discriminators, but be irrelevant to the rest. In
they the would be four different columns. Although, this could be
addressed with:

common_to_four = colaesce(d3.common, d6.common, d9.common, d11.common)

But it may be better to add this as a nullable column to the common
table.

Likewise, if two discrimiators are very similar, it may be better to
lump them in the same table.
I guess another approach would be to use name/value pairs but that
would make reporting really ugly.
And you get less control over your spelling errors. But sometimes this
is the way to go.
Seems like a very common problem. Any ideas? Is this a fundamental
limitation of SQL?

Sort of. SQL tables are squared, and object-oriented hierachies are
jagged.

But that's alright, just don't be too object-oriented. Be pragmatic too.

--
Erland Sommarskog, SQL Server MVP, so****@algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Jul 19 '05 #4

Robert Brown

"Laconic2" <la******@comcast.net> wrote in message news:<E8********************@comcast.com>...

"Robert Brown" <ro*************@yahoo.com> wrote in message
news:24**************************@posting.google.c om...
Is there a good approach to modelling many heterogeneous entity types
with that have some attributes in common?
This is a frequently asked question. Unfortunately, the frequently given
responses don't usually settle the matter.

Could you please post links to any relevant past threads if you
remember where to find them?I tried to search the news group for
"generalization hierarchy" but did not find anything useful. Is there
a better search term I can use?

There are many employee subtypes and more can be dynamically added
after the application is deployed so it's obviously no good to keep
adding attributes to the employees table because most attributes will
be NULL (since janitors are never doctors at the same time).

The only solution I found for this is a generalization hiearchy where
you have the employee table with all generic attributes and then you
add tables for each new employee subtype as necessary. The subtype
tables share the primary key of the employee table. The employee table
has a "discriminator" field that allows you to figure out which
subtype table to load for a particular entity.
If you invent new subtypes on an adhoc basis, (e.g. a DBA is never either a
janitor or a doctor)
and you invent new tables for subtypes on an equally ad hoc basis, I'd
suggest that you are altering your data model on an ad hoc basis.

You can do that if you want, but don't expect the same power and simplicity
that you get from a stable data model.

This solution does not seem to scale since for each value of
"discriminator" I need to perform a join with a different table. What
if I need to retrieve 1,000 employees at once?

Why would you ever need the attributes of more than one subtype? If your
query is about doctors, why would you need to join the table about janitors
into the query?

Because I need to display users and their heterogenious attributes in
the UI as one list (e.g. in response to a search).
If your query is about employees, why would you need to know any of the
subtype attributes?

Same reason, our customer's HR department does not see the employees
as separate entities. To them an employee is an employee and they want
to see all the associated data while browsing employees. I guess you
could show the generic attributes on the first pass and then require
the end user to click to drill down and see the subtype attributes -
but that's not what they want.

Or do you I need to iterate look at the discriminator and then
perform the appropriate join? If this kind of iteration is necessary
then obviously this generalization hierarchy approach does not work in
practice
since it would be painfully slow.

You need to ask two questions about the model, and about any imlpementation
based on the model:
first, is it correct and complete? Second, if yes to the first, is it fast
enough?

In general, different implementations of the same model should be
transparent to users of the imlpementation who only look at the features
visible in the model.

Is there a better approach to modelling these kind of heterogeneous
entities with shared attributes that does not involve creating a table
for each new employee type or having sparce tables (mostly filled with
NULLS)

If you have new entities, you are going to have new relations. That either
means having new tables, or fudging the relationship between tables and
rleations. Do the second one at your own peril.
Seems like a very common problem. Any ideas? Is this a fundamental
limitation of SQL?

I would suggest it goes beyond SQl to the very heart of using foreign key/
primary key aossciations to establish linkages.

Jul 19 '05 #5

Christopher Browne

In the last exciting episode, "Laconic2" <la******@comcast.net> wrote:

"Robert Brown" <ro*************@yahoo.com> wrote in message
news:24**************************@posting.google.c om...
adding attributes to the employees table because most attributes will
be NULL (since janitors are never doctors at the same time).

By the way, after I wrote "a DBA is never a janitor or a doctor" it
occurred to me that when I was a DBA there were days when I felt
like I was both at the same time! ;)

Some days it also combines coroner and roto-rooter operator...

A couple coworkers were in that state today :-(.
--
wm(X,Y):-write(X),write('@'),write(Y). wm('cbbrowne','acm.org').
http://www3.sympatico.ca/cbbrowne/advocacy.html
If a person with multiple personalities threatens suicide, is that
considered a hostage situation?

Jul 19 '05 #6

Payson

This is always messy. With dynamically added subtypes, I would
consider keeping everything soft. Here is a first approximation for
the tables:

Employee Valid Subtypes Attributes/Subtype Valid Attributes
Subtype<--------Subtype--------->Subtype ?Datatype
Employee # ?Desc Attribute<-----------Attribute
| ?Desc
|
Employee Attributes
Employee #
Attribute
Attribute data

You could then build a list of available attributes for an employee
based on his subtype from the attributes/subtype table. The data
entry screen(s) could be built from the Valid Attributes
table(depending on how smart it needs to be).

This is a bit rough, but I hope the idea comes through. Control over
the Valid Subtypes table, the Attributes table, etc. would need to be
defined (who and how), but, gosh, I would have to have to create a new
table every time a new subtype was added.

LOL.. here I go again, ask me what time it is and I tell you how to
build a watch. Hope this helps more than it hurts.

Payson
ro*************@yahoo.com (Robert Brown) wrote in message news:<24**************************@posting.google. com>...

Is there a good approach to modelling many heterogeneous entity types
with that have some attributes in common?

Say I have entities "employees" which share some attibutes (e.g.
firstname, lastname, dateofbirth) but some subsets of employees (e.g.
physicians, janitors, nurses, ambulance drivers) may have additional
attributes that do not apply to all employees. Physicians may have
attributes specialty and date of board certification, ambulance
drivers may have a drivers license id, janitors may have
preferredbroomtype and so on.

There are many employee subtypes and more can be dynamically added
after the application is deployed so it's obviously no good to keep
adding attributes to the employees table because most attributes will
be NULL (since janitors are never doctors at the same time).

The only solution I found for this is a generalization hiearchy where
you have the employee table with all generic attributes and then you
add tables for each new employee subtype as necessary. The subtype
tables share the primary key of the employee table. The employee table
has a "discriminator" field that allows you to figure out which
subtype table to load for a particular entity.

This solution does not seem to scale since for each value of
"discriminator" I need to perform a join with a different table. What
if I need to retrieve 1,000 employees at once?

Is that possible to obtain a single ResultSet with one SQL statement
SQL?

Or do you I need to iterate look at the discriminator and then
perform the appropriate join? If this kind of iteration is necessary
then obviously this generalization hierarchy approach does not work in
practice
since it would be painfully slow.

Is there a better approach to modelling these kind of heterogeneous
entities with shared attributes that does not involve creating a table
for each new employee type or having sparce tables (mostly filled with
NULLS)

I guess another approach would be to use name/value pairs but that
would make reporting really ugly.

Seems like a very common problem. Any ideas? Is this a fundamental
limitation of SQL?

Thanks!

- robert

Jul 19 '05 #7

--CELKO--

>> Is there a good approach to modelling many heterogeneous entity
types with that have some attributes in common? <<

Ignoring that the phrase "entity type" is OO **yuck!**

The classic scenario calls for a root class with all the common
attributes and then specialized sub-classes under it. As an example,
let's take the class of Vehicles and find an industry standard
identifier (VIN), and add two mutually exclusive sub-classes, Sport
utility vehicles and sedans ('SUV', 'SED').

CREATE TABLE Vehicles
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) NOT NULL
CHECK(vehicle_type IN ('SUV', 'SED')),
UNIQUE (vin, vehicle_type),
..);

Notice the overlapping candidate keys. I then use a compound candidate
key (vin, vehicle_type) and a constraint in each sub-class table to
assure that the vehicle_type is locked and agrees with the Vehicles
table. Add some DRI actions and you are done:

CREATE TABLE SUV
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT 'SUV' NOT NULL
CHECK(vehicle_type = 'SUV'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

CREATE TABLE Sedans
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL
CHECK(vehicle_type = 'SED'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

I can continue to build a hierarchy like this. For example, if I had
a Sedans table that broke down into two-door and four-door sedans, I
could a schema like this:

CREATE TABLE Sedans
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL
CHECK(vehicle_type IN ('2DR', '4DR', 'SED')),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

CREATE TABLE TwoDoor
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT '2DR' NOT NULL
CHECK(vehicle_type = '2DR'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Sedans(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

CREATE TABLE FourDoor
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT '4DR' NOT NULL
CHECK(vehicle_type = '4DR'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Sedans (vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

The idea is to build a chain of identifiers and types in a UNIQUE()
constraint that go up the tree when you use a REFERENCES constraint.
Obviously, you can do variants of this trick to get different class
structures.

If an entity doesn't have to be exclusively one subtype, you play with
the root of the class hierarchy:

CREATE TABLE Vehicles
(vin CHAR(17) NOT NULL,
vehicle_type CHAR(3) NOT NULL
CHECK(vehicle_type IN ('SUV', 'SED')),
PRIMARY KEY (vin, vehicle_type),
..);

Now start hiding all this stuff in VIEWs immediately and add an
INSTEAD OF trigger to those VIEWs.

Here is the link on Amazon.com for my new book on "Trees & Hierarchies
in SQL"

http://www.amazon.com/exec/obidos/tg...roduct-details

Jul 19 '05 #8

Erland Sommarskog

--CELKO-- (jc*******@earthlink.net) writes:

CREATE TABLE Vehicles
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) NOT NULL
CHECK(vehicle_type IN ('SUV', 'SED')),
UNIQUE (vin, vehicle_type),
..);

Notice the overlapping candidate keys. I then use a compound candidate
key (vin, vehicle_type) and a constraint in each sub-class table to
assure that the vehicle_type is locked and agrees with the Vehicles
table. Add some DRI actions and you are done:

Ah! Interesting trick! I'll save a mental note about that.

--
Erland Sommarskog, SQL Server MVP, so****@algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Jul 19 '05 #9

DaaaDaaa

jc*******@earthlink.net (--CELKO--) wrote in message news:<18**************************@posting.google. com>...

Is there a good approach to modelling many heterogeneous entity
types with that have some attributes in common? <<

Ignoring that the phrase "entity type" is OO **yuck!**

The classic scenario calls for a root class with all the common
attributes and then specialized sub-classes under it. As an example,
let's take the class of Vehicles and find an industry standard
identifier (VIN), and add two mutually exclusive sub-classes, Sport
utility vehicles and sedans ('SUV', 'SED').

CREATE TABLE Vehicles
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) NOT NULL
CHECK(vehicle_type IN ('SUV', 'SED')),
UNIQUE (vin, vehicle_type),
..);

Notice the overlapping candidate keys. I then use a compound candidate
key (vin, vehicle_type) and a constraint in each sub-class table to
assure that the vehicle_type is locked and agrees with the Vehicles
table. Add some DRI actions and you are done:

CREATE TABLE SUV
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT 'SUV' NOT NULL
CHECK(vehicle_type = 'SUV'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

CREATE TABLE Sedans
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL
CHECK(vehicle_type = 'SED'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

I can continue to build a hierarchy like this. For example, if I had
a Sedans table that broke down into two-door and four-door sedans, I
could a schema like this:

CREATE TABLE Sedans
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL
CHECK(vehicle_type IN ('2DR', '4DR', 'SED')),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

CREATE TABLE TwoDoor
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT '2DR' NOT NULL
CHECK(vehicle_type = '2DR'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Sedans(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

CREATE TABLE FourDoor
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT '4DR' NOT NULL
CHECK(vehicle_type = '4DR'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Sedans (vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

The idea is to build a chain of identifiers and types in a UNIQUE()
constraint that go up the tree when you use a REFERENCES constraint.
Obviously, you can do variants of this trick to get different class
structures.

If an entity doesn't have to be exclusively one subtype, you play with
the root of the class hierarchy:

CREATE TABLE Vehicles
(vin CHAR(17) NOT NULL,
vehicle_type CHAR(3) NOT NULL
CHECK(vehicle_type IN ('SUV', 'SED')),
PRIMARY KEY (vin, vehicle_type),
..);

Now start hiding all this stuff in VIEWs immediately and add an
INSTEAD OF trigger to those VIEWs.

Here is the link on Amazon.com for my new book on "Trees & Hierarchies
in SQL"

http://www.amazon.com/exec/obidos/tg...roduct-details

Good stuff, for one thing I don't get it,
Why would one need NOT NULL in the following DDL, CREATE TABLE Sedans
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL
CHECK(vehicle_type IN ('2DR', '4DR', 'SED')),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

since by definition PRIMARY KEY cannot be null, so, why would I tell
people, "I'm an American and an American citizen", instead of just
say, "I'm an American"?

Jul 19 '05 #10

ctcgag

ro*************@yahoo.com (Robert Brown) wrote:

This solution does not seem to scale since for each value of
"discriminator" I need to perform a join with a different table. What
if I need to retrieve 1,000 employees at once?

Why would you ever need the attributes of more than one subtype? If
your query is about doctors, why would you need to join the table
about janitors into the query?

Because I need to display users and their heterogenious attributes in
the UI as one list (e.g. in response to a search).

Uh, that isn't really an answer.

If your query is about employees, why would you need to know any of the
subtype attributes?

Same reason, our customer's HR department does not see the employees
as separate entities. To them an employee is an employee and they want
to see all the associated data while browsing employees. I guess you
could show the generic attributes on the first pass and then require
the end user to click to drill down and see the subtype attributes -
but that's not what they want.

It sounds to me like the real solution is to fire your HR department and
replace them with people who just do their jobs, rather than browsing
around on the other employees information out of idle curiousity.

If the main goal is to provide custom interfaces, one per job-type, that
support job-specific functions--and satisfying nosy Nellies is only
secondary--then I think I would go with the main-table / multiple sub-table
set up. When people want to see data mixed aggregations of job-types for
no apparent reason, they can damn well wait the 5.2 seconds it will take to
assemble it.

If the main or only goal is to satisfy nosy Nellies, I would either go
with a auxilliary table of name-value pairs (and another auxilliary table
with listing allowed or suggested attributes for each job type, or just
do away with all those tables and make a "free text" clob column on the
employee table (or call it XML rather than free text, if that type of
things turns you on).

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB

Jul 19 '05 #11

Mikito Harakiri

<ct****@hotmail.com> wrote in message
news:20*******************@newsreader.com...

ro*************@yahoo.com (Robert Brown) wrote:
If the main or only goal is to satisfy nosy Nellies, I would either go
with a auxilliary table of name-value pairs (and another auxilliary table
with listing allowed or suggested attributes for each job type, or just
do away with all those tables and make a "free text" clob column on the
employee table (or call it XML rather than free text, if that type of
things turns you on).

Yeah, right, make it XML, and watch your performance going down the drain.

If you want "flexible" structure, use auxilliary name-value table. Convert
it into nested collection if you have object fever.

Jul 19 '05 #12

Nick Landsberg

ct****@hotmail.com wrote:
My 2 cents thrown in at the bottom:

ro*************@yahoo.com (Robert Brown) wrote:
This solution does not seem to scale since for each value of
"discriminator" I need to perform a join with a different table. What
if I need to retrieve 1,000 employees at once?

Why would you ever need the attributes of more than one subtype? If
your query is about doctors, why would you need to join the table
about janitors into the query?

Because I need to display users and their heterogenious attributes in
the UI as one list (e.g. in response to a search).

Uh, that isn't really an answer.

If your query is about employees, why would you need to know any of the
subtype attributes?

Same reason, our customer's HR department does not see the employees
as separate entities. To them an employee is an employee and they want
to see all the associated data while browsing employees. I guess you
could show the generic attributes on the first pass and then require
the end user to click to drill down and see the subtype attributes -
but that's not what they want.

It sounds to me like the real solution is to fire your HR department and
replace them with people who just do their jobs, rather than browsing
around on the other employees information out of idle curiousity.

If the main goal is to provide custom interfaces, one per job-type, that
support job-specific functions--and satisfying nosy Nellies is only
secondary--then I think I would go with the main-table / multiple sub-table
set up. When people want to see data mixed aggregations of job-types for
no apparent reason, they can damn well wait the 5.2 seconds it will take to
assemble it.

If the main or only goal is to satisfy nosy Nellies, I would either go
with a auxilliary table of name-value pairs (and another auxilliary table
with listing allowed or suggested attributes for each job type, or just
do away with all those tables and make a "free text" clob column on the
employee table (or call it XML rather than free text, if that type of
things turns you on).

Xho

When I was dealing with large database systems, we tried
to make a distinction between "transaction" and "report"
and tried to get the response time requirements ignored
for "reports".

Anything which returns 1,000 rows is no longer a "transaction"
in my book. As Xho said, "... they can damn well wait ..."

If this were a print job, it would print on roughly 20 pages,
give or take. It would take whoever about 2 minutes
to even give it a cursory scan. About the same amount of time
to scroll down through a screenful of 1000 entries too,
probably more.

Questions I used to ask when confronted with this:
- What is the business need for this information?
- Can you usually wait for it overnight?
- In a pinch (when overnight isn't good enough), would
10 minutes be OK?
- If you really need it in 1-2 seconds, how much is your
hardware budget? (Because I'm going to need more/faster hardware
to satisfy this request in just a second or two.)

What actually used to worry me about having such "reports"
as on-line "transactions" was how they would impact the
response times for the other users who were doing
stuff which actually did require rapid response
(e.g. pull up an individual's medical history in an
emergency).

(I think that was actully 3 cents :)

NPL
--
"It is impossible to make anything foolproof
because fools are so ingenious"
- A. Bloch

Jul 19 '05 #13

Joel Garry

da**@rock.com (DaaaDaaa) wrote in message news:<21*************************@posting.google.c om>...

Good stuff, for one thing I don't get it,
Why would one need NOT NULL in the following DDL,
CREATE TABLE Sedans
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL
CHECK(vehicle_type IN ('2DR', '4DR', 'SED')),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

Methinks he means theoretically the primary key could be separate from
the vin definition. Just because Oracle disallows it doesn't mean it
is redundant. I don't know about ANSI or other db's offhand.
since by definition PRIMARY KEY cannot be null, so, why would I tell
people, "I'm an American and an American citizen", instead of just
say, "I'm an American"?

'Cause you are not these guys:
http://www.signonsandiego.com/news/m...m11deport.html
http://www.signonsandiego.com/uniont...n29deport.html

Obviously, the attributes of American are cultural, and of a citizen
are legal. A citizen could be raised abroad and not consider
themselves American, and it is possible to have null citizenship.
http://www.keepmedia.com/ShowItemDet...0032&oliID=213
http://www.hrw.org/reports/1995/Kuwait.htm

jg
--
@home.com is bogus.
Happy Birthday, Jerry Mathers.

Jul 19 '05 #14

Job Miller

Taking out of context slightly:
Tom Kyte's viewpoint on this kind of thing I think is:

"sounds like a single table design to me. I do not factor out 1:1
optional
relationships unless a table is "really really wide" -- really wide
being into 3 digits.

put the most likely to be null columns at the end of the create table
and when
they are NULL, they will consume NO space.

benchmark it. you would always have to join to pick up this optional
information (2 or 3 LIO's at least per row retrieved for each optional
set of
data) vs an extra 50 bytes of flags saying "this is null". I would go
for the
extra 50 bytes in a row that will be accessed via an index rather then
incurring 2/3 LIO's to read an index to access another table. "

The question was asking about subtype modeling, but the example the
questioner posed wasn't really subtype. The info they were talking
about was actually M:1 relationships most likely, but Tom's comments
apply to the subtype situation you are discussing.

Your model and application are much more simple with this single table
model.

Read the link for yourself to see the full thread:

http://asktom.oracle.com/pls/ask/f?p...12678058160686

Jul 19 '05 #15

Noons

Job Miller allegedly said,on my timestamp of 3/06/2004 7:08 PM:

put the most likely to be null columns at the end of the create table
and when
they are NULL, they will consume NO space.

The problem is when you may have two or more of these
optional 1:1 relationships. What happens when you got a bunch
of columns set to NULL, another bunch set to values, then another bunch
set to NULL again, and someone comes along and fills the first bunch
with values?
benchmark it. you would always have to join to pick up this optional
information (2 or 3 LIO's at least per row retrieved for each optional
set of
data) vs an extra 50 bytes of flags saying "this is null". I would go
for the
extra 50 bytes in a row that will be accessed via an index rather then
incurring 2/3 LIO's to read an index to access another table. "
I wouldn't. If it made the app easier to document, code
and expand, I'd trade the extra 2 or 3 lios easily. Heck, I'd trade
10 or more lios easily, for that! That is the sort of price I don't
mind paying.

Assuming of course I'm not designing/writing the
next 10000000/TPS TPC benchmark.
Then again, how many of those has anyone found in real life?
Then again, how many of those do indeed need a 1:1?
It's all relative, no?

In general, designs calling for that sort of level of
specification are not for performance critical apps, which makes
the extra lio quite acceptable.
Your model and application are much more simple with this single table
model.

I don't think so. I have to add a heap of IS [NOT] NULL combinations
to handle existence or not existence of subtypes. IS NULL is not
an easy condition to combine with others. And it makes the code
unnecessarily dense, as well as stopping me from ever being able
to run this app in a db that doesn't support NULLS the way
Oracle does. In all, not a good choice.
I'm afraid on that one I'll have to disagree with Tom, assuming once
again we are not talking about a TPC benchmark type of app.
--
Cheers
Nuno Souto
wi*******@yahoo.com.au.nospam

Jul 19 '05 #16

SQL for Modeling Generalization Hierarchies

Similar topics