Machine level things like a BIT or BYTE datatype have no place in a
high level language like SQL. SQL is a high level language; it is
abstract and defined without regard to PHYSICAL implementation. This
basic principle of data modeling is called data abstraction.
SQL is a high level language; it is abstract and defined without regard
to PHYSICAL implementation. This basic principle of data modeling is
called data abstraction.
Bits and Bytes are the <i>lowest<i> units of hardware-specific,
physical implementation you can get. Are you on a high-end or low-end
machine? Does the machine have 8, 16, 32, 64, or 128 bit words? Twos
complement or ones complement math? Hey, the ANSI/ISO Standards allow
decimal machines, so bits do not exist at all! What about NULLs? To
be a SQL datatype, you have to have NULLs, so what is a NULL bit? By
definition a bit, is on or off and has no NULL. If you vendor adds
NULLs to bit, how are the bit-wise operations defined? Oh what a
tangled web we weave when first we mix logical and physical :)
What does the implementation of the host languages do with bits? Did
you know that +1, +0, -0 and -1 are all used for BOOLEANs, but not
consistently? In C#, Boolean values are 0/1 for FALSE/TRUE, while
VB.NET has boolean values of 0/-1 for FALSE/TRUE and they are
proprietary languages from the same vendor. That means <i>all<i> the
host languages -- present, future and not-yet-defined -- can be
different. Surely, no good programmer would ever write non-portable
code by getting to such a low level as bit fiddling!!
There are usually two situations in practice. Either the bits are
individual attributes or they are used as a vector to represent a
single attribute. In the case of a single attribute, the encoding is
limited to two values, which do not port to host languages or other
SQLs, cannot be easily understood by an end user, and which cannot be
expanded.
In the second case what some Newbies, who are still thinking in terms
of second and third generation programming languages or even punch
cards, do is build a vector for a series of "yes/no" status codes,
failing to see the status vector as a single attribute. Did you ever
play the children's game "20 Questions" when you were young?
Imagine you have six components for a loan approval, so you allocate
bits in your second generation model of the world. You have 64 possible
vectors, but only 5 of them are valid (i.e. you cannot be rejected for
bankruptcy and still have good credit). For your data integrity, you
can:
1) Ignore the problem. This is actually what <i>most<i> newbies do.
2) Write elaborate CHECK() constraints with user defined functions or
proprietary bit level library functions that cannot port and that run
like cold glue.
Now we add a 7-th condition to the vector -- which end does it go on?
Why? How did you get it in the right place on all the possible
hardware that it will ever use? Did all the code that references a bit
in a word by its position do it right after the change?
You need to sit down and think about how to design an encoding of the
data that is high level, general enough to expand, abstract and
portable. For example, is that loan approval a hierarchical code?
concatenation code? vector code? etc? Did you provide codes for
unknown, missing and N/A values? It is not easy to design such things!
The problem is not DB2 (yes, if you want to be a bad programmer you can
kludge it with UDF), but that you are still thinking too close to the
hardware. It takes a year or more of SQL programming to really have a
revelation and escape procedural, low level programming habits.
I have a book on SQL programming style coming out later this year that
might help.