473,387 Members | 1,455 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

SQL for Modeling Generalization Hierarchies

Is there a good approach to modelling many heterogeneous entity types
with that have some attributes in common?

Say I have entities "employees" which share some attibutes (e.g.
firstname, lastname, dateofbirth) but some subsets of employees (e.g.
physicians, janitors, nurses, ambulance drivers) may have additional
attributes that do not apply to all employees. Physicians may have
attributes specialty and date of board certification, ambulance
drivers may have a drivers license id, janitors may have
preferredbroomtype and so on.

There are many employee subtypes and more can be dynamically added
after the application is deployed so it's obviously no good to keep
adding attributes to the employees table because most attributes will
be NULL (since janitors are never doctors at the same time).

The only solution I found for this is a generalization hiearchy where
you have the employee table with all generic attributes and then you
add tables for each new employee subtype as necessary. The subtype
tables share the primary key of the employee table. The employee table
has a "discriminator" field that allows you to figure out which
subtype table to load for a particular entity.

This solution does not seem to scale since for each value of
"discriminator" I need to perform a join with a different table. What
if I need to retrieve 1,000 employees at once?

Is that possible to obtain a single ResultSet with one SQL statement
SQL?

Or do you I need to iterate look at the discriminator and then
perform the appropriate join? If this kind of iteration is necessary
then obviously this generalization hierarchy approach does not work in
practice
since it would be painfully slow.

Is there a better approach to modelling these kind of heterogeneous
entities with shared attributes that does not involve creating a table
for each new employee type or having sparce tables (mostly filled with
NULLS)

I guess another approach would be to use name/value pairs but that
would make reporting really ugly.

Seems like a very common problem. Any ideas? Is this a fundamental
limitation of SQL?

Thanks!

- robert
Jul 19 '05 #1
15 8826

"Robert Brown" <ro*************@yahoo.com> wrote in message
news:24**************************@posting.google.c om...
Is there a good approach to modelling many heterogeneous entity types
with that have some attributes in common?
This is a frequently asked question. Unfortunately, the frequently given
responses don't usually settle the matter.

There are many employee subtypes and more can be dynamically added
after the application is deployed so it's obviously no good to keep
adding attributes to the employees table because most attributes will
be NULL (since janitors are never doctors at the same time).

The only solution I found for this is a generalization hiearchy where
you have the employee table with all generic attributes and then you
add tables for each new employee subtype as necessary. The subtype
tables share the primary key of the employee table. The employee table
has a "discriminator" field that allows you to figure out which
subtype table to load for a particular entity.
If you invent new subtypes on an adhoc basis, (e.g. a DBA is never either a
janitor or a doctor)
and you invent new tables for subtypes on an equally ad hoc basis, I'd
suggest that you are altering your data model on an ad hoc basis.

You can do that if you want, but don't expect the same power and simplicity
that you get from a stable data model.

This solution does not seem to scale since for each value of
"discriminator" I need to perform a join with a different table. What
if I need to retrieve 1,000 employees at once?
Why would you ever need the attributes of more than one subtype? If your
query is about doctors, why would you need to join the table about janitors
into the query?

If your query is about employees, why would you need to know any of the
subtype attributes?

Or do you I need to iterate look at the discriminator and then
perform the appropriate join? If this kind of iteration is necessary
then obviously this generalization hierarchy approach does not work in
practice
since it would be painfully slow.
You need to ask two questions about the model, and about any imlpementation
based on the model:
first, is it correct and complete? Second, if yes to the first, is it fast
enough?

In general, different implementations of the same model should be
transparent to users of the imlpementation who only look at the features
visible in the model.


Is there a better approach to modelling these kind of heterogeneous
entities with shared attributes that does not involve creating a table
for each new employee type or having sparce tables (mostly filled with
NULLS)
If you have new entities, you are going to have new relations. That either
means having new tables, or fudging the relationship between tables and
rleations. Do the second one at your own peril.
Seems like a very common problem. Any ideas? Is this a fundamental
limitation of SQL?


I would suggest it goes beyond SQl to the very heart of using foreign key/
primary key aossciations to establish linkages.
Jul 19 '05 #2

"Robert Brown" <ro*************@yahoo.com> wrote in message
news:24**************************@posting.google.c om...
adding attributes to the employees table because most attributes will
be NULL (since janitors are never doctors at the same time).


By the way, after I wrote "a DBA is never a janitor or a doctor" it
occurred to me that when I was a DBA there were days when I felt like I was
both at the same time! ;)

Jul 19 '05 #3
Robert Brown (ro*************@yahoo.com) writes:
The only solution I found for this is a generalization hiearchy where
you have the employee table with all generic attributes and then you
add tables for each new employee subtype as necessary. The subtype
tables share the primary key of the employee table. The employee table
has a "discriminator" field that allows you to figure out which
subtype table to load for a particular entity.

This solution does not seem to scale since for each value of
"discriminator" I need to perform a join with a different table. What
if I need to retrieve 1,000 employees at once?
So what? 1000 rows for a modern RDBMS is a breeze.
Is that possible to obtain a single ResultSet with one SQL statement
SQL?
Yes, although with many discriminators, it will be one hell of a join:

SELECT main.col1, main.col2, ... d1.col1, d1.col2, ...
FROM main
LEFT JOIN discriminator d1 ON d1.keycol = main.keycol
LEFT JOIN discriminator d2 ON d2.keycol = main.keycol
...

In practice, things might be even messier, because some values might
apply to four discriminators, but be irrelevant to the rest. In
they the would be four different columns. Although, this could be
addressed with:

common_to_four = colaesce(d3.common, d6.common, d9.common, d11.common)

But it may be better to add this as a nullable column to the common
table.

Likewise, if two discrimiators are very similar, it may be better to
lump them in the same table.
I guess another approach would be to use name/value pairs but that
would make reporting really ugly.
And you get less control over your spelling errors. But sometimes this
is the way to go.
Seems like a very common problem. Any ideas? Is this a fundamental
limitation of SQL?


Sort of. SQL tables are squared, and object-oriented hierachies are
jagged.

But that's alright, just don't be too object-oriented. Be pragmatic too.

--
Erland Sommarskog, SQL Server MVP, so****@algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 19 '05 #4
"Laconic2" <la******@comcast.net> wrote in message news:<E8********************@comcast.com>...
"Robert Brown" <ro*************@yahoo.com> wrote in message
news:24**************************@posting.google.c om...
Is there a good approach to modelling many heterogeneous entity types
with that have some attributes in common?
This is a frequently asked question. Unfortunately, the frequently given
responses don't usually settle the matter.


Could you please post links to any relevant past threads if you
remember where to find them?I tried to search the news group for
"generalization hierarchy" but did not find anything useful. Is there
a better search term I can use?
There are many employee subtypes and more can be dynamically added
after the application is deployed so it's obviously no good to keep
adding attributes to the employees table because most attributes will
be NULL (since janitors are never doctors at the same time).

The only solution I found for this is a generalization hiearchy where
you have the employee table with all generic attributes and then you
add tables for each new employee subtype as necessary. The subtype
tables share the primary key of the employee table. The employee table
has a "discriminator" field that allows you to figure out which
subtype table to load for a particular entity.
If you invent new subtypes on an adhoc basis, (e.g. a DBA is never either a
janitor or a doctor)
and you invent new tables for subtypes on an equally ad hoc basis, I'd
suggest that you are altering your data model on an ad hoc basis.

You can do that if you want, but don't expect the same power and simplicity
that you get from a stable data model.

This solution does not seem to scale since for each value of
"discriminator" I need to perform a join with a different table. What
if I need to retrieve 1,000 employees at once?


Why would you ever need the attributes of more than one subtype? If your
query is about doctors, why would you need to join the table about janitors
into the query?


Because I need to display users and their heterogenious attributes in
the UI as one list (e.g. in response to a search).
If your query is about employees, why would you need to know any of the
subtype attributes?


Same reason, our customer's HR department does not see the employees
as separate entities. To them an employee is an employee and they want
to see all the associated data while browsing employees. I guess you
could show the generic attributes on the first pass and then require
the end user to click to drill down and see the subtype attributes -
but that's not what they want.

Or do you I need to iterate look at the discriminator and then
perform the appropriate join? If this kind of iteration is necessary
then obviously this generalization hierarchy approach does not work in
practice
since it would be painfully slow.


You need to ask two questions about the model, and about any imlpementation
based on the model:
first, is it correct and complete? Second, if yes to the first, is it fast
enough?

In general, different implementations of the same model should be
transparent to users of the imlpementation who only look at the features
visible in the model.


Is there a better approach to modelling these kind of heterogeneous
entities with shared attributes that does not involve creating a table
for each new employee type or having sparce tables (mostly filled with
NULLS)


If you have new entities, you are going to have new relations. That either
means having new tables, or fudging the relationship between tables and
rleations. Do the second one at your own peril.
Seems like a very common problem. Any ideas? Is this a fundamental
limitation of SQL?


I would suggest it goes beyond SQl to the very heart of using foreign key/
primary key aossciations to establish linkages.

Jul 19 '05 #5
In the last exciting episode, "Laconic2" <la******@comcast.net> wrote:
"Robert Brown" <ro*************@yahoo.com> wrote in message
news:24**************************@posting.google.c om...
adding attributes to the employees table because most attributes will
be NULL (since janitors are never doctors at the same time).


By the way, after I wrote "a DBA is never a janitor or a doctor" it
occurred to me that when I was a DBA there were days when I felt
like I was both at the same time! ;)


Some days it also combines coroner and roto-rooter operator...

A couple coworkers were in that state today :-(.
--
wm(X,Y):-write(X),write('@'),write(Y). wm('cbbrowne','acm.org').
http://www3.sympatico.ca/cbbrowne/advocacy.html
If a person with multiple personalities threatens suicide, is that
considered a hostage situation?
Jul 19 '05 #6
This is always messy. With dynamically added subtypes, I would
consider keeping everything soft. Here is a first approximation for
the tables:

Employee Valid Subtypes Attributes/Subtype Valid Attributes
Subtype<--------Subtype--------->Subtype ?Datatype
Employee # ?Desc Attribute<-----------Attribute
| ?Desc
|
Employee Attributes
Employee #
Attribute
Attribute data

You could then build a list of available attributes for an employee
based on his subtype from the attributes/subtype table. The data
entry screen(s) could be built from the Valid Attributes
table(depending on how smart it needs to be).

This is a bit rough, but I hope the idea comes through. Control over
the Valid Subtypes table, the Attributes table, etc. would need to be
defined (who and how), but, gosh, I would have to have to create a new
table every time a new subtype was added.

LOL.. here I go again, ask me what time it is and I tell you how to
build a watch. Hope this helps more than it hurts.

Payson
ro*************@yahoo.com (Robert Brown) wrote in message news:<24**************************@posting.google. com>...
Is there a good approach to modelling many heterogeneous entity types
with that have some attributes in common?

Say I have entities "employees" which share some attibutes (e.g.
firstname, lastname, dateofbirth) but some subsets of employees (e.g.
physicians, janitors, nurses, ambulance drivers) may have additional
attributes that do not apply to all employees. Physicians may have
attributes specialty and date of board certification, ambulance
drivers may have a drivers license id, janitors may have
preferredbroomtype and so on.

There are many employee subtypes and more can be dynamically added
after the application is deployed so it's obviously no good to keep
adding attributes to the employees table because most attributes will
be NULL (since janitors are never doctors at the same time).

The only solution I found for this is a generalization hiearchy where
you have the employee table with all generic attributes and then you
add tables for each new employee subtype as necessary. The subtype
tables share the primary key of the employee table. The employee table
has a "discriminator" field that allows you to figure out which
subtype table to load for a particular entity.

This solution does not seem to scale since for each value of
"discriminator" I need to perform a join with a different table. What
if I need to retrieve 1,000 employees at once?

Is that possible to obtain a single ResultSet with one SQL statement
SQL?

Or do you I need to iterate look at the discriminator and then
perform the appropriate join? If this kind of iteration is necessary
then obviously this generalization hierarchy approach does not work in
practice
since it would be painfully slow.

Is there a better approach to modelling these kind of heterogeneous
entities with shared attributes that does not involve creating a table
for each new employee type or having sparce tables (mostly filled with
NULLS)

I guess another approach would be to use name/value pairs but that
would make reporting really ugly.

Seems like a very common problem. Any ideas? Is this a fundamental
limitation of SQL?

Thanks!

- robert

Jul 19 '05 #7
>> Is there a good approach to modelling many heterogeneous entity
types with that have some attributes in common? <<

Ignoring that the phrase "entity type" is OO **yuck!**

The classic scenario calls for a root class with all the common
attributes and then specialized sub-classes under it. As an example,
let's take the class of Vehicles and find an industry standard
identifier (VIN), and add two mutually exclusive sub-classes, Sport
utility vehicles and sedans ('SUV', 'SED').

CREATE TABLE Vehicles
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) NOT NULL
CHECK(vehicle_type IN ('SUV', 'SED')),
UNIQUE (vin, vehicle_type),
..);

Notice the overlapping candidate keys. I then use a compound candidate
key (vin, vehicle_type) and a constraint in each sub-class table to
assure that the vehicle_type is locked and agrees with the Vehicles
table. Add some DRI actions and you are done:

CREATE TABLE SUV
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT 'SUV' NOT NULL
CHECK(vehicle_type = 'SUV'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

CREATE TABLE Sedans
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL
CHECK(vehicle_type = 'SED'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

I can continue to build a hierarchy like this. For example, if I had
a Sedans table that broke down into two-door and four-door sedans, I
could a schema like this:

CREATE TABLE Sedans
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL
CHECK(vehicle_type IN ('2DR', '4DR', 'SED')),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

CREATE TABLE TwoDoor
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT '2DR' NOT NULL
CHECK(vehicle_type = '2DR'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Sedans(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

CREATE TABLE FourDoor
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT '4DR' NOT NULL
CHECK(vehicle_type = '4DR'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Sedans (vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

The idea is to build a chain of identifiers and types in a UNIQUE()
constraint that go up the tree when you use a REFERENCES constraint.
Obviously, you can do variants of this trick to get different class
structures.

If an entity doesn't have to be exclusively one subtype, you play with
the root of the class hierarchy:

CREATE TABLE Vehicles
(vin CHAR(17) NOT NULL,
vehicle_type CHAR(3) NOT NULL
CHECK(vehicle_type IN ('SUV', 'SED')),
PRIMARY KEY (vin, vehicle_type),
..);

Now start hiding all this stuff in VIEWs immediately and add an
INSTEAD OF trigger to those VIEWs.

Here is the link on Amazon.com for my new book on "Trees & Hierarchies
in SQL"

http://www.amazon.com/exec/obidos/tg...roduct-details
Jul 19 '05 #8
--CELKO-- (jc*******@earthlink.net) writes:
CREATE TABLE Vehicles
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) NOT NULL
CHECK(vehicle_type IN ('SUV', 'SED')),
UNIQUE (vin, vehicle_type),
..);

Notice the overlapping candidate keys. I then use a compound candidate
key (vin, vehicle_type) and a constraint in each sub-class table to
assure that the vehicle_type is locked and agrees with the Vehicles
table. Add some DRI actions and you are done:


Ah! Interesting trick! I'll save a mental note about that.

--
Erland Sommarskog, SQL Server MVP, so****@algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 19 '05 #9
jc*******@earthlink.net (--CELKO--) wrote in message news:<18**************************@posting.google. com>...
Is there a good approach to modelling many heterogeneous entity
types with that have some attributes in common? <<

Ignoring that the phrase "entity type" is OO **yuck!**

The classic scenario calls for a root class with all the common
attributes and then specialized sub-classes under it. As an example,
let's take the class of Vehicles and find an industry standard
identifier (VIN), and add two mutually exclusive sub-classes, Sport
utility vehicles and sedans ('SUV', 'SED').

CREATE TABLE Vehicles
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) NOT NULL
CHECK(vehicle_type IN ('SUV', 'SED')),
UNIQUE (vin, vehicle_type),
..);

Notice the overlapping candidate keys. I then use a compound candidate
key (vin, vehicle_type) and a constraint in each sub-class table to
assure that the vehicle_type is locked and agrees with the Vehicles
table. Add some DRI actions and you are done:

CREATE TABLE SUV
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT 'SUV' NOT NULL
CHECK(vehicle_type = 'SUV'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

CREATE TABLE Sedans
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL
CHECK(vehicle_type = 'SED'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

I can continue to build a hierarchy like this. For example, if I had
a Sedans table that broke down into two-door and four-door sedans, I
could a schema like this:

CREATE TABLE Sedans
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL
CHECK(vehicle_type IN ('2DR', '4DR', 'SED')),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

CREATE TABLE TwoDoor
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT '2DR' NOT NULL
CHECK(vehicle_type = '2DR'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Sedans(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

CREATE TABLE FourDoor
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT '4DR' NOT NULL
CHECK(vehicle_type = '4DR'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Sedans (vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

The idea is to build a chain of identifiers and types in a UNIQUE()
constraint that go up the tree when you use a REFERENCES constraint.
Obviously, you can do variants of this trick to get different class
structures.

If an entity doesn't have to be exclusively one subtype, you play with
the root of the class hierarchy:

CREATE TABLE Vehicles
(vin CHAR(17) NOT NULL,
vehicle_type CHAR(3) NOT NULL
CHECK(vehicle_type IN ('SUV', 'SED')),
PRIMARY KEY (vin, vehicle_type),
..);

Now start hiding all this stuff in VIEWs immediately and add an
INSTEAD OF trigger to those VIEWs.

Here is the link on Amazon.com for my new book on "Trees & Hierarchies
in SQL"

http://www.amazon.com/exec/obidos/tg...roduct-details


Good stuff, for one thing I don't get it,
Why would one need NOT NULL in the following DDL, CREATE TABLE Sedans
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL
CHECK(vehicle_type IN ('2DR', '4DR', 'SED')),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

since by definition PRIMARY KEY cannot be null, so, why would I tell
people, "I'm an American and an American citizen", instead of just
say, "I'm an American"?
Jul 19 '05 #10
ro*************@yahoo.com (Robert Brown) wrote:

This solution does not seem to scale since for each value of
"discriminator" I need to perform a join with a different table. What
if I need to retrieve 1,000 employees at once?


Why would you ever need the attributes of more than one subtype? If
your query is about doctors, why would you need to join the table
about janitors into the query?


Because I need to display users and their heterogenious attributes in
the UI as one list (e.g. in response to a search).


Uh, that isn't really an answer.
If your query is about employees, why would you need to know any of the
subtype attributes?


Same reason, our customer's HR department does not see the employees
as separate entities. To them an employee is an employee and they want
to see all the associated data while browsing employees. I guess you
could show the generic attributes on the first pass and then require
the end user to click to drill down and see the subtype attributes -
but that's not what they want.


It sounds to me like the real solution is to fire your HR department and
replace them with people who just do their jobs, rather than browsing
around on the other employees information out of idle curiousity.

If the main goal is to provide custom interfaces, one per job-type, that
support job-specific functions--and satisfying nosy Nellies is only
secondary--then I think I would go with the main-table / multiple sub-table
set up. When people want to see data mixed aggregations of job-types for
no apparent reason, they can damn well wait the 5.2 seconds it will take to
assemble it.

If the main or only goal is to satisfy nosy Nellies, I would either go
with a auxilliary table of name-value pairs (and another auxilliary table
with listing allowed or suggested attributes for each job type, or just
do away with all those tables and make a "free text" clob column on the
employee table (or call it XML rather than free text, if that type of
things turns you on).

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
Jul 19 '05 #11
<ct****@hotmail.com> wrote in message
news:20*******************@newsreader.com...
ro*************@yahoo.com (Robert Brown) wrote:
If the main or only goal is to satisfy nosy Nellies, I would either go
with a auxilliary table of name-value pairs (and another auxilliary table
with listing allowed or suggested attributes for each job type, or just
do away with all those tables and make a "free text" clob column on the
employee table (or call it XML rather than free text, if that type of
things turns you on).


Yeah, right, make it XML, and watch your performance going down the drain.

If you want "flexible" structure, use auxilliary name-value table. Convert
it into nested collection if you have object fever.

Jul 19 '05 #12
ct****@hotmail.com wrote:
My 2 cents thrown in at the bottom:
ro*************@yahoo.com (Robert Brown) wrote:
This solution does not seem to scale since for each value of
"discriminator" I need to perform a join with a different table. What
if I need to retrieve 1,000 employees at once?

Why would you ever need the attributes of more than one subtype? If
your query is about doctors, why would you need to join the table
about janitors into the query?


Because I need to display users and their heterogenious attributes in
the UI as one list (e.g. in response to a search).

Uh, that isn't really an answer.

If your query is about employees, why would you need to know any of the
subtype attributes?


Same reason, our customer's HR department does not see the employees
as separate entities. To them an employee is an employee and they want
to see all the associated data while browsing employees. I guess you
could show the generic attributes on the first pass and then require
the end user to click to drill down and see the subtype attributes -
but that's not what they want.

It sounds to me like the real solution is to fire your HR department and
replace them with people who just do their jobs, rather than browsing
around on the other employees information out of idle curiousity.

If the main goal is to provide custom interfaces, one per job-type, that
support job-specific functions--and satisfying nosy Nellies is only
secondary--then I think I would go with the main-table / multiple sub-table
set up. When people want to see data mixed aggregations of job-types for
no apparent reason, they can damn well wait the 5.2 seconds it will take to
assemble it.

If the main or only goal is to satisfy nosy Nellies, I would either go
with a auxilliary table of name-value pairs (and another auxilliary table
with listing allowed or suggested attributes for each job type, or just
do away with all those tables and make a "free text" clob column on the
employee table (or call it XML rather than free text, if that type of
things turns you on).

Xho


When I was dealing with large database systems, we tried
to make a distinction between "transaction" and "report"
and tried to get the response time requirements ignored
for "reports".

Anything which returns 1,000 rows is no longer a "transaction"
in my book. As Xho said, "... they can damn well wait ..."

If this were a print job, it would print on roughly 20 pages,
give or take. It would take whoever about 2 minutes
to even give it a cursory scan. About the same amount of time
to scroll down through a screenful of 1000 entries too,
probably more.

Questions I used to ask when confronted with this:
- What is the business need for this information?
- Can you usually wait for it overnight?
- In a pinch (when overnight isn't good enough), would
10 minutes be OK?
- If you really need it in 1-2 seconds, how much is your
hardware budget? (Because I'm going to need more/faster hardware
to satisfy this request in just a second or two.)

What actually used to worry me about having such "reports"
as on-line "transactions" was how they would impact the
response times for the other users who were doing
stuff which actually did require rapid response
(e.g. pull up an individual's medical history in an
emergency).

(I think that was actully 3 cents :)

NPL
--
"It is impossible to make anything foolproof
because fools are so ingenious"
- A. Bloch
Jul 19 '05 #13
da**@rock.com (DaaaDaaa) wrote in message news:<21*************************@posting.google.c om>...

Good stuff, for one thing I don't get it,
Why would one need NOT NULL in the following DDL,
CREATE TABLE Sedans
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL
CHECK(vehicle_type IN ('2DR', '4DR', 'SED')),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

Methinks he means theoretically the primary key could be separate from
the vin definition. Just because Oracle disallows it doesn't mean it
is redundant. I don't know about ANSI or other db's offhand.
since by definition PRIMARY KEY cannot be null, so, why would I tell
people, "I'm an American and an American citizen", instead of just
say, "I'm an American"?


'Cause you are not these guys:
http://www.signonsandiego.com/news/m...m11deport.html
http://www.signonsandiego.com/uniont...n29deport.html

Obviously, the attributes of American are cultural, and of a citizen
are legal. A citizen could be raised abroad and not consider
themselves American, and it is possible to have null citizenship.
http://www.keepmedia.com/ShowItemDet...0032&oliID=213
http://www.hrw.org/reports/1995/Kuwait.htm

jg
--
@home.com is bogus.
Happy Birthday, Jerry Mathers.
Jul 19 '05 #14
Taking out of context slightly:
Tom Kyte's viewpoint on this kind of thing I think is:

"sounds like a single table design to me. I do not factor out 1:1
optional
relationships unless a table is "really really wide" -- really wide
being into 3 digits.

put the most likely to be null columns at the end of the create table
and when
they are NULL, they will consume NO space.

benchmark it. you would always have to join to pick up this optional
information (2 or 3 LIO's at least per row retrieved for each optional
set of
data) vs an extra 50 bytes of flags saying "this is null". I would go
for the
extra 50 bytes in a row that will be accessed via an index rather then
incurring 2/3 LIO's to read an index to access another table. "

The question was asking about subtype modeling, but the example the
questioner posed wasn't really subtype. The info they were talking
about was actually M:1 relationships most likely, but Tom's comments
apply to the subtype situation you are discussing.

Your model and application are much more simple with this single table
model.

Read the link for yourself to see the full thread:

http://asktom.oracle.com/pls/ask/f?p...12678058160686
Jul 19 '05 #15
Job Miller allegedly said,on my timestamp of 3/06/2004 7:08 PM:

put the most likely to be null columns at the end of the create table
and when
they are NULL, they will consume NO space.

The problem is when you may have two or more of these
optional 1:1 relationships. What happens when you got a bunch
of columns set to NULL, another bunch set to values, then another bunch
set to NULL again, and someone comes along and fills the first bunch
with values?
benchmark it. you would always have to join to pick up this optional
information (2 or 3 LIO's at least per row retrieved for each optional
set of
data) vs an extra 50 bytes of flags saying "this is null". I would go
for the
extra 50 bytes in a row that will be accessed via an index rather then
incurring 2/3 LIO's to read an index to access another table. "
I wouldn't. If it made the app easier to document, code
and expand, I'd trade the extra 2 or 3 lios easily. Heck, I'd trade
10 or more lios easily, for that! That is the sort of price I don't
mind paying.

Assuming of course I'm not designing/writing the
next 10000000/TPS TPC benchmark.
Then again, how many of those has anyone found in real life?
Then again, how many of those do indeed need a 1:1?
It's all relative, no?

In general, designs calling for that sort of level of
specification are not for performance critical apps, which makes
the extra lio quite acceptable.
Your model and application are much more simple with this single table
model.


I don't think so. I have to add a heap of IS [NOT] NULL combinations
to handle existence or not existence of subtypes. IS NULL is not
an easy condition to combine with others. And it makes the code
unnecessarily dense, as well as stopping me from ever being able
to run this app in a db that doesn't support NULLS the way
Oracle does. In all, not a good choice.
I'm afraid on that one I'll have to disagree with Tom, assuming once
again we are not talking about a TPC benchmark type of app.
--
Cheers
Nuno Souto
wi*******@yahoo.com.au.nospam
Jul 19 '05 #16

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Redd | last post by:
The following is a technical report on a data modeling project that was recently assigned to me by my professor. I post it so that anyone else who is studying databases and data modeling can have...
13
by: Robert Brown | last post by:
Is there a good approach to modelling many heterogeneous entity types with that have some attributes in common? Say I have entities "employees" which share some attibutes (e.g. firstname,...
0
by: Joseph H Allen | last post by:
I've discovered that the C preprocessor is powerful enough to create a cheap object modeling language for XML (or any structured data format). Check out http://world.std.com/~jhallen/sdu.html The...
1
by: Markus Seibold | last post by:
Hi, I am looking for an open source (free) data modeling tool for PostgreSQL. It should - if possible - support: - E-R-Modelling - relational data model / modeling - (GUI SQL interface to...
2
by: Luca | last post by:
Is it possible to represent a generalization (coming from an E-R chart) in Microsoft Access, and how?? For example I have the "GUEST" class and two sub-classes "GUEST-BUY" and "GUEST-SELL" (this...
1
by: Mike | last post by:
I'm starting up on big project that will take much of my time in next year or two. Until now I was mostly self-employed on small projects so I didn't spent so much time modeling system before...
1
by: Jim Slade | last post by:
I just installed Windows Server 2003 on a new machine and want to test the ability to run multiple sites off of one server - as would be found in a hosted environment. What I'm interested in is how...
1
by: fireball | last post by:
please help, I need professional approach how to realize generalization (parent_table <- child_table) idea in sql server databse? in my Oracle db I can have PK of child_table as FK from...
25
by: Thomas R. Hummel | last post by:
I'm going to try to describe this situation as best as I can, but if anything is unclear please let me know. Some of this database was already in place before I arrived on the scene, and I don't...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.