472,146 Members | 1,395 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,146 software developers and data experts.

NULL vs. "Unknown Value"

I'm running into a situation that has me adding a value of "Unknown" to
a reference table. I am being pulled between two trains of thought, and
was curious to get other's input on in. I give an example below.

1-) Adding "Unknown" to a reference table is bad. Doing so effectively
changes the Nullability option of every FK that references the table to
a NULLable FK relation.

2-) Simply adding a "Not Known/Undetermined" value to the reference
table greatly simplifies things. No schema changes are required, and
programs that use the reference table to populate their drop-downs will
automatically see the new value.

Perhaps both approaches are good, but it would all depend of the
context, the criticality of other FKs that reference the table, how/when
the data is being used?

==============================================
EXAMPLE
==============================================

Assume two tables. Employee & EyeColor, as described below.

+===================================+
|Employee |
+----------------+------------------+
|EmployeeId(PK) | EyeColorId (FK) |
|NOT NULL | NOT NULL |
+----------------+------------------+
|marc | 1 |
|dan | 2 |
|sonya | 1 |
+================+==================+

+================================================= ===+
|EyeColor |
+---------------+------------------+-----------------+
|EyeColorId(PK) | EnglishName(AK1) | FrenchName(AK2) |
|NOT NULL | NOT NULL | NOT NULL |
+---------------+------------------+-----------------+
| 1 | Brown | Brun |
| 2 | Bloodshot | Rouge |
| 3 | Blue | Bleue |
+===============+==================+============== ===+

And let's say that an automated process is being built to import Eye
Colors from central database. In this process EyeColor may no longer be
available.

With solution #1, new (or existing) data is changed as follows:
+----------------+------------------+
|EmployeeId(PK) | EyeColorId (FK) |
|NOT NULL | NULL |
+----------------+------------------+
|marc | 1 |
|dan | 2 |
|sonya | 1 |
|newemp | NULL |
+================+==================+

+================================================= ===+
|EyeColor |
+---------------+------------------+-----------------+
|EyeColorId(PK) | EnglishName(AK1) | FrenchName(AK2) |
|NOT NULL | NOT NULL | NOT NULL |
+---------------+------------------+-----------------+
| 1 | Brown | Brun |
| 2 | Bloodshot | Rouge |
| 3 | Blue | Bleue |
+===============+==================+============== ===+

With solution #2, new (or existing) data is changed as follows:
+----------------+------------------+
|EmployeeId(PK) | EyeColorId (FK) |
|NOT NULL | NULL |
+----------------+------------------+
|marc | 1 |
|dan | 2 |
|sonya | 1 |
|newemp | 0 |
+================+==================+

+================================================= ===+
|EyeColor |
+---------------+------------------+-----------------+
|EyeColorId(PK) | EnglishName(AK1) | FrenchName(AK2) |
|NOT NULL | NOT NULL | NOT NULL |
+---------------+------------------+-----------------+
| 0 | Unknown | Inconnu |
| 1 | Brown | Brun |
| 2 | Bloodshot | Rouge |
| 3 | Blue | Bleue |
+===============+==================+============== ===+
*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
Jul 20 '05 #1
3 3232
Marcus wrote:
I'm running into a situation that has me adding a value of "Unknown" to
a reference table. I am being pulled between two trains of thought, and
was curious to get other's input on in. I give an example below.

1-) Adding "Unknown" to a reference table is bad. Doing so effectively
changes the Nullability option of every FK that references the table to
a NULLable FK relation.

You could fix this with check constraints. A bigger issue is that null
is intended to mean unknown (although it gets used for absent, as well).
By using a real value for unknown, three-valued logic will only come up
when it's inappropriate (as I would say absent really is a value, and
should be treated as such in comparisons).

If the distinction between absent and unknown is important for your
application, I would leave the unknowns as null, and put a row into
the reference table for absent. This is cleanest with a identity
keys, because checking for a magic value that means absent will be
inevitable, and either 0 or -1 is a fairly obvious choice that can
be the same for all tables.

(To head off a repeat discussion, we'll acknowledge that some people --
Joe Celko foremost among them -- consider identity keys evil, as they
are sort-of reminiscent of file pointers. I consider meaningful keys
to be evil, as they violate the spirit of the 1NF and 2NF rules.
I don't think anyone is likely to be convinced here.)

Bill

Jul 20 '05 #2
William Cleveland wrote:
Marcus wrote:
I'm running into a situation that has me adding a value of "Unknown" to
a reference table. I am being pulled between two trains of thought, and
was curious to get other's input on in. I give an example below.

1-) Adding "Unknown" to a reference table is bad. Doing so effectively
changes the Nullability option of every FK that references the table to
a NULLable FK relation.

You could fix this with check constraints. A bigger issue is that null
is intended to mean unknown (although it gets used for absent, as well).
By using a real value for unknown, three-valued logic will only come up
when it's inappropriate (as I would say absent really is a value, and
should be treated as such in comparisons).

If the distinction between absent and unknown is important for your
application, I would leave the unknowns as null, and put a row into
the reference table for absent. This is cleanest with a identity
keys, because checking for a magic value that means absent will be
inevitable, and either 0 or -1 is a fairly obvious choice that can
be the same for all tables.

(To head off a repeat discussion, we'll acknowledge that some people --
Joe Celko foremost among them -- consider identity keys evil, as they
are sort-of reminiscent of file pointers. I consider meaningful keys
to be evil, as they violate the spirit of the 1NF and 2NF rules.
I don't think anyone is likely to be convinced here.)

Bill


One thing to consider is that when using joins, imagine a database with
two tables, Patients and Blood, if blood type is unknown and you use a
query to match up patients with bags of blood by blood type then the
results can be deadly. Extreme example I know but food for thought.

--
Error reading sig - A)bort R)etry I)nfluence with large hammer
Jul 20 '05 #3
>> I'm running into a situation that has me adding a value of
"Unknown" to a reference table. <<

Did you mean "Referenced" table?
1-) Adding "Unknown" to a reference table is bad. <<
No it is not. Get a copy of DATA & DATABASES and read the chapters on
missing values and designing encoding schemes.

A good encoding scheme can have several kinds of missing values.
Example; ICD disease codes use 000.000 for "undiagnosed, just
admitted" and 999.999 for "admited, diagnosed and we don't know what
this crud is". Likewise a good survey form DB has {'yes', 'no', 'not
answered', 'N/A'}, the ISO sex codes are {0 =unknown, 1 =male, 2
=female, 9 =lawful person such as corporations, etc. }

Even Dr. Codd had two kinds of NULLs in the second version of the
Relational model (missing value and missing attribute). SPARC listed
over 20 kinds of missing values in a 1975 paper.
Perhaps both approaches are good, but it would all depend of the

context, .. <<

Bingo! The NULL is portable and it has particular behavior. I use
NULLs in (start_time, finish_time) duration pairs as an "eternity"
symbol because I can say "COALESCE (finish_time, CURRENT_TIMESTAMP)"
to get a truly current report. But time is a continuum and encodings
are discrete sets of values.

I'd probably make 'Unknown' the DEFAULT to take advantage of the DRI
actions, too.
Jul 20 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

3 posts views Thread by Karunakararao | last post: by
9 posts views Thread by Klaus Johannes Rusch | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.