>> I looked all over for the standards document. Do
you know where I can download or buy it? <<
http://lists.oasis-open.org/archives.../msg00005.html
has a link to a .zip file, but there are others with summaries. Here is
a short summary of the NCITS L8 Metadata Standards Committee rules for
data elements:
http://pueblo.lbl.gov/~olken/X3L8/dr...raft.docs.html
The ISO-11179 Standard is broken down into six sections:
11179-1: Framework for the Specification and Standardization of Data
Elements Definitions
11179-2: Classification for Data Elements
11179-3: Basic Attributes of Data Elements
11179-4: Rules and Guidelines for the Formulation of Data
11179-5: Naming and Identification Principles for Data
11179-6: Registration of Data Elements
Since I cannot reprint the standard, let me remark on the highlights of
some of these sections.
Naming Data Elements
There are two extremes; either the name is so general it tells us
nothing ("id", "date", "amount", etc.) or it is made useless by telling
us too much ("type_code_id_value" , not to be confused with a mere
"type_code" of course!).
When you name a data element, it name for what it is NOT for where it is
stored, NOT for how it is represented. The "tbl-" prefix is
particularly silly, since SQL has only one data structure. Do you put
"n_" in front of every noun you write? Do you think that this would
make English easier to read?
Section 11179-4 has a good simple set of rules for defining a scalar
data element. A data definition shall:
a) be unique (within any data dictionary in which it appears)
b) be stated in the singular
c) state what the concept is, not only what it is not
d) be stated as a descriptive phrase or sentence(s)
e) contain only commonly understood abbreviations
f) be expressed without embedding definitions of other data elements or
underlying concepts
The document then goes on to explain how to apply these rules with
illustrations. There are three kinds of rules that form a complete
naming convention:
* Semantic rules based on the components of data elements.
* Syntax rules for arranging the components within a name.
* Lexical rules for the language-related aspects of names.
While the following naming convention is oriented to the development of
application-level names, the rule set may be adapted to the development
of names at any level.
Annex A of ISO 11179-5 gives an example of all three of these rules.
Levels of rules progress from the most general (conceptual) and become
more and more specific (physical). The objects at each level are called
data element components and they are assembled, in part or whole, into
names. The idea is that the final names will be both as discrete and
complete as possible.
While this formalism is nice in theory, names are subject to constraints
imposed by software limitations in the real world. Another problem is
that one data element may have many names depending on the context in
which it is used. It might be called something in a report and
something else in an EDI file. Provision for identification of
synonymous names is made through sets of name-context pairs in the
element description. Since many names may be associated with a single
data element, it is important to also use a unique identifier, usually
in the form of a number, to distinguish each data element from any
other. ISO 11179-5 discusses assigning this identifier at the
International registry level. Both the identifier and at least one name
are considered necessary to comply with ISO 11179-5. Each organization
should decide the form of identifier best suited to its individual
requirements.
Levels of Abstraction
Name development begins at the conceptual level. An object class
represents an idea, abstraction or thing in the real world, such as tree
or country. A property is something that describes all objects in the
class, such as height or identifier. This lets us form terms such as
"tree height" or "country identifier" from the combination of the class
and the property.
The level in the process is the logical level. A complete logical data
element must include a form of representation for the values in its data
value domain (the set of possible valid values of a data element). The
representation term describes the data element's representation class.
The representation class is equivalent to the class word of the
prime/class naming convention many data administrators are familiar
with. This gets us to "tree height measure", "country identifier name"
and "country identifier code" as possible data elements.
There is a subtle difference between "identifier name" and "identifier
code" and it might be so subtle that we do not want to model it. But we
would need a rule to drop the property term in this case. The property
would still exist as part of the inheritance structure of the data
element, but it would not be part of the data element name.
Some logical data elements can be considered generic elements if they
are well-defined and are shared across organizations. Country names and
country codes are well-defined in ISO Standard 3166, Codes for the
Representation of Names of Countries, and you might simply reference
this document.
Note that this is the highest level at which true data elements, by the
definition of ISO 11179, appear: they have an object class, a property,
and a representation.
The next is the application level. This is usually done with a
quantifier which applies to the particular application. The quantifier
will either subset the data value domain or add more restrictions to the
definition so that we work with only those values needed in the
application.
For example, assume that we are using ISO 3166 country codes, but we are
only interested in Europe. This would be a simple subset of the
standard, but it will not change over time. However, the subset of
countries with more than 20 cm of rain this year will vary greatly over
time.
Changes in the name to reflect this will be accomplished by addition of
qualifier terms to the logical name. For example, if an application of
Country name were to list all the countries a certain organization had
trading agreements with, the application data element would be called
Trading partner country name. The data value domain would consist of a
subset of countries listed in ISO 3166. Note that the qualifier term
trading partner is itself an object class. This relationship could be
expressed in a hierarchical relationship in the data model.
The physical name is the lowest level. These are the names which
actually appear in the database table column headers, file descriptions,
EDI transaction file layouts, and so forth. They may be abbreviations
or use a limited character set because of software restrictions.
However, they might also add information about their origin or format.
In a registry, each of the data element names and name components, will
always be paired with its context so that we know the source or usage of
the name or name component. The goal is to be able to trace each data
element from its source to wherever it is used, regardless of the name
it appears under.
I recommned that tables be given collective or plural names, since they
are sets and not scalar. If the table has one and only one row, then
use a singular name. UPPERCASE the keywords, capitalize schema objects
(tables, views, procs, etc.) and lowercase scalars (variables, column
names, etc.) Use underscores which make reading easy and are portable;
do not use special characters like '#', '$', '[', ']', etc. which some
SQL vendors allow.
I like postfixes that tell me something about the semantics of scale
used for the column, as per the above:
"_id" = Identifier, it is unique in the schema and refer to one entity
anywhere it appears in the schema. Never use "<table name>_id" ; that
is a name based on location and tell you this is probably not a real key
at all. Just plain "id" is too vague to be useful to anyone and will
screw up your data dictionary when you have to find a zillion of them,
all different,but with the same data element name (you do have one,
don't you?).
"_date" = date, temporal dimension.
"_nbr" = tag number; do not use "_no" since it looks like the Boolean
yes/no value
"_name" = explain itself, nominal scale
"_code" = expect to translate this for humans.
"_total" = a sum, an aggregated dimension which is logically different
from its parts.
"_seq" = sequence, ordinal numbering.
"_tally" = cardinal number. Also called an absolute scale.
Then an application might have some special situations with units of
measurement that need to shown on an attribute or dimension. And ALWAYS
check to see if there is an ISO standard for a data element.
Well, actually there was a key but I didn't know how to write the
script to recreate it. This is all pretty new to me. <<
The bad news is that SQL Server's tools for pulbishing the schema data
stinks. Lots of square brackets, propreitary code, the constraints get
loaded into ALTER statements, etc. In short, it is something for a
machine and not a human being to read.
--CELKO--
===========================
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, datatypes, etc. in your
schema are.
*** Sent via Developersdex
http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!