Representation for Heterogeneous Attribute Set

robertbrown1971

My company is working on a bond derivative portfolio analysis tool and
we're facing a problem that I did not see adequately addressed any
where in literature. I really did RTFM. I'm very experienced in
relational modelling (10+ years) so this is not a case of not
understanding the principles. Here is the problem stripped of
irrelevant context. The problem below is simplified for the sake of the
example so don't sweat the details.

THE PROBLEM

1. There are many types of bonds, each type has a different set of
attributes, different attribute names, different attribute datatypes.

For example, bond A has two variables: a yearly interest rate and
date of issue, B has five variables: an interest rate and 4 specific
dates on which various portions of principal need to be paid, bond C
has a set of 4 variables: interest rate in period 1, interest rate in
period 2, the date on which the bond can be put back to the issuer,
and two dates on which the bond can be called by the issue. And so on.

So, on the first attempt I could represent each bond type as its own
table. For example,

create table bond_type_a (rate INTEGER, issue_date DATE)
create table bond_type_b (rate INTEGER, principle_date1 DATE,
principle_date2 DATE, principle_date3 DATE, principle_date4 DATE)
create table bond_type_c (rate1 INTEGER, rate2 INTEGER, put_date DATE,
call_date DATE)

This is the nice relational approach but it does not work because:

2. There are many thousands of bond types thus we would have to have
many thousands of tables which is bad.
3. The client needs to be able construct the bond types on the fly
through the UI and add it to the system. Obviously, it would be bad if
each new type of bond created in the UI resulted in a new table.

4. When a user loads the bond portfolio it needs to be very fast. In
the table per type approach if a user has a 100 different types if bond
in the portfolio you would have to do 100 joins. This is a heavily
multi user environment so it's a non-starter. It's impossibly slow.

THE SOLUTIONS

So now that we ditched the table per bond type approach we can consider
the followiing solutions (unpleasant from the relational point of
view):

1. Name-Value pairs.

create table bonds (bond_id INTEGER, bond_type INTEGER, attribute_id
INTEGER, value VARCHAR(255))

Comment: The client does not like this approach because they want to
run various kinds of reports and thus they doe not want the values to
be stored as VARCHAR. They want the DB to enforce the datatype.
2. Typed Name-Value pairs.

create table bonds (bond_id INTEGER, bond_type INTEGER, attribute_id
INTEGER, int_val INTEGER, string_val VARCHAR(255), date_val DATE_

Comment: The client does not like this because the table is sparse.
Every row has two empty fields.
3. Link table with table per data type.

create table bonds (bond_id INTEGER)

create table bond_int_data (bond_id INTEGER REFERENCES bonds(bond_id),
value INTEGER)
create table bond_string_data (bond_id INTEGER REFERENCES
bonds(bond_id), value VARCHAR(255))
create table bond_date_data (bond_id INTEGER REFERENCES bonds(bond_id),
value DATE)

Comment: This meets most of the requirements but it just looks ugly.
4. Dynamic Mapping

create table (bond_id INTEGER, int_val1 INTEGER, int_val2 INTEGER,
date_val1 DATE, date_val2 DATE, string_val1 VARCHAR(255), string_val2
VARCHAR(255))

Then you have to add some dynamic mapping in your code which will
provide bond specific mapping (say, stored in an XML file). For
example,

For bond_A: yearly_rate maps to int_val1, issue_date maps to date_val1
For bond_C: rate1 maps to int_val1, rate2 maps to int_val2, put_date
maps to date_val1, call_date maps to date_val2)

Comment: This is very good for performance because when I load a
portfolio of different bond types I can pull them all in in one SELECT
statement. However this approach has a problem that the table is
sparse. The number of fields of each type has to be as high as to
accmodate the most complex bond while simple bonds will only be using
two or three.
THE QUESTIONS:

Are the four approaches I described above exhaustive? Are there any
other that I overlooked?

Jul 23 '05 #1

Subscribe Reply

1436

David Cressey

<ro*************@yahoo.com> wrote in message
news:11**********************@l41g2000cwc.googlegr oups.com...

My company is working on a bond derivative portfolio analysis tool and
we're facing a problem that I did not see adequately addressed any
where in literature. I really did RTFM. I'm very experienced in
relational modelling (10+ years) so this is not a case of not
understanding the principles. Here is the problem stripped of
irrelevant context. The problem below is simplified for the sake of the
example so don't sweat the details.

[major snip]

Robert,

Normally, when I see a question like yours, I recommend doing a google
search on "relational model generalization".

However, I think, from the examples you provide, that you already know the
things most people learn from such a search.
You might want to do the search anyway, just to cross check your
conclusions.

What follows is, I am afraid, not terribly helpful. I write it in the hope
that it can lead somewhere constructive.

The nut of the problem, it seems to me, is the requirement that users can
define new types of bonds on the fly.

In classical relational design, a data analysis of the subject matter
reveals all the "entities" and "relationships between entities" that are in
the universe of discourse. From there, adding the attributes and domains
that describe all of the relevant data values is straightforward, if
tedious.

The next step is to design a relational model. In this step a fairly
limited, and very stable set of tuple types is derived. The tuple types are
comprehensive, in the sense that they permit stating all the facts in the
universe of discourse in terms of the data domains. You only need to add a
new tuple type when the universe of discourse expands.

The next step is to build a physical model, and build the database. When
that's done, we generally find that adding a new instance of an existing
tuple type can be done by inserting new data. Creating a new tuple type, on
the other hand, requires altering the metadata. That's classical relational
design and development.

From your outline of the problem, I'm pretty sure you already know
everything I've said up to this point, even if you would have phrased it
differently.

Now, here's the rub: Can you create a new bond type without creating a new
tuple type? I think the answer is no.
Can you create a new tuple type without altering metadata? I think the
answer is no.
Can you trust end users with the management of metadata? I think the answer
is no.

The usual solution, which you've called "dynamic" is to disguise metadata as
data, by storing it in user tables.
That may be the best you are going to do. But it still means that your
users are managing metadata, whether they know it or not.

This is not the answer, but it's worth giving some thought.

Jul 23 '05 #2

Dawn M. Wolthuis

<ro*************@yahoo.com> wrote in message
news:11**********************@l41g2000cwc.googlegr oups.com...
<snip>

THE QUESTIONS:

Are the four approaches I described above exhaustive? Are there any
other that I overlooked?

I hesitate to mention this, but in case you are open to using data models
other than relational, this is a perfect application for tools that permit
weak typing, among other features. If you were to model this application in
XML documents, for example, ignoring dtds and xsds for now, you will see
that the pattern to the data that makes sense in your mind and the minds of
the users can also make sense to a database product. It simply isn't a good
match for today's RDBMS implementations, nor for a strongly typed
persistence engine.

--dawn

Jul 23 '05 #3

-CELKO-

Do the auditors and the feds know that anyone can invent a new kind of
security in your system? I doubt it.

The classic scenario calls for a root class with all the common
attributes and then specialized sub-classes under it. As an example,
let's take the class of Vehicles and find an industry standard
identifier (VIN), and add two mutually exclusive sub-classes, Sport
utility vehicles and sedans ('SUV', 'SED').

CREATE TABLE Vehicles
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) NOT NULL
CHECK(vehicle_type IN ('SUV', 'SED')),
UNIQUE (vin, vehicle_type),
..);

Notice the overlapping candidate keys. I then use a compound candidate
key (vin, vehicle_type) and a constraint in each sub-class table to
assure that the vehicle_type is locked and agrees with the Vehicles
table. Add some DRI actions and you are done:

CREATE TABLE SUV
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT 'SUV' NOT NULL
CHECK(vehicle_type = 'SUV'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

CREATE TABLE Sedans
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL
CHECK(vehicle_type = 'SED'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

I can continue to build a hierarchy like this. For example, if I had a
Sedans table that broke down into two-door and four-door sedans, I
could a schema like this:

CREATE TABLE Sedans
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL
CHECK(vehicle_type IN ('2DR', '4DR', 'SED')),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

CREATE TABLE TwoDoor
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT '2DR' NOT NULL
CHECK(vehicle_type = '2DR'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Sedans(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

CREATE TABLE FourDoor
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT '4DR' NOT NULL
CHECK(vehicle_type = '4DR'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Sedans (vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

The idea is to build a chain of identifiers and types in a UNIQUE()
constraint that go up the tree when you use a REFERENCES constraint.
Obviously, you can do variants of this trick to get different class
structures.

If an entity doesn't have to be exclusively one subtype, you play with
the root of the class hierarchy:

CREATE TABLE Vehicles
(vin CHAR(17) NOT NULL,
vehicle_type CHAR(3) NOT NULL
CHECK(vehicle_type IN ('SUV', 'SED')),
PRIMARY KEY (vin, vehicle_type),
..);

Now start hiding all this stuff in VIEWs immediately and add an INSTEAD
OF trigger to those VIEWs.

Jul 23 '05 #4

robertbrown1971

Joe Celko,

I actually bought and studied back to back your book "Trees and
Hierarchies in SQL" in search of the solution to this problem. The book
is great and is the only one on the market (that I could find) that
deals head-on with the common problem of hierarchy represnetation in
SQL. I learned about the nested set model, path enumeration,
id-parentid etc. It has earned a permanent place on my reference book
shelf but unfortunately it does not address the problem I outlined in
this post. I'm not being faceteous here - I really did like the book a
lot.

I am also well familiar with the post you provided in reponse to my
question. I studied it in the archives and I understand it. In fact, I
used the approach you outlined elsewhere several times. I also own
your "SQL for smarties book" which is also very good. Now to address
your specific questions:
-CELKO- wrote:

Do the auditors and the feds know that anyone can invent a new kind of security in your system? I doubt it.
Robert replies:

This is not a trading system. It is an ANALYSIS tool where the analyst
can invent new bonds and other instruments and see how they behave and
how they affect the portfolio behaviour. A common way to analyze hedges
for a portfolio is to try out different possibilities to see which ones
work in a computer simulation. Then once you find the right one - then
you see if there are existing combinations of securities (bonds,
call/put options, floating bonds, commodity linked bonds etc) that can
approximate the behaviour of the ideal hedge you modelled.

-CELKO- wrote:
The classic scenario calls for a root class with all the common
attributes and then specialized sub-classes under it. As an example,
let's take the class of Vehicles and find an industry standard
identifier (VIN), and add two mutually exclusive sub-classes, Sport
utility vehicles and sedans ('SUV', 'SED').
Robert repies:

1) This is the generalization hierarchy approach. It does not work in
this case since the subtypes are quite numerous and can be created
dynamically. Your beef was primarily with dynamic creation of
entities. That's actually not the only issue. Suppose we ditch the
dynamic creation you still have the problem that you have thousands of
tables, one for each bond type. If your portfolio has 800 bonds of 100
different types then you need to do a 100 joins when the user pulls up
the portfolio. The performance of that is far worse than that of the
aesthetically ugly name-value approach.

2) The generalization hiearachy makes sense *only* when many attributes
are shared and only some attributes are specific to the subtype. That
is not the case here there is very little overlap in attributes among
the different bond. The only attribute they all have in commong is
perhaps maturity but even here there are exceptions. For example one
might think all debt instruments would have an interest rate. Not the
case, there are zero coupon bonds. You might think every debt
instrument has a maturity. Not the case, the maturity can be
conidtional. And so on.

3) When wrote the post I suspected, as Dawn Walthius wrote, that this
problem is simply not amenable to relational modelling. As several
posters noted this use case can be very easily represented in an OO
system or in an hierarchical persistence such as XML. I do not see a
good solution for this in relational terms. It's a choice among
imperfect alternatives.

CREATE TABLE Vehicles
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) NOT NULL
CHECK(vehicle_type IN ('SUV', 'SED')),
UNIQUE (vin, vehicle_type),
..);

Notice the overlapping candidate keys. I then use a compound candidate key (vin, vehicle_type) and a constraint in each sub-class table to
assure that the vehicle_type is locked and agrees with the Vehicles
table. Add some DRI actions and you are done:

CREATE TABLE SUV
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT 'SUV' NOT NULL
CHECK(vehicle_type = 'SUV'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

CREATE TABLE Sedans
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL
CHECK(vehicle_type = 'SED'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

I can continue to build a hierarchy like this. For example, if I had a Sedans table that broke down into two-door and four-door sedans, I
could a schema like this:

CREATE TABLE Sedans
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL
CHECK(vehicle_type IN ('2DR', '4DR', 'SED')),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

CREATE TABLE TwoDoor
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT '2DR' NOT NULL
CHECK(vehicle_type = '2DR'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Sedans(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

CREATE TABLE FourDoor
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT '4DR' NOT NULL
CHECK(vehicle_type = '4DR'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Sedans (vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

The idea is to build a chain of identifiers and types in a UNIQUE()
constraint that go up the tree when you use a REFERENCES constraint.
Obviously, you can do variants of this trick to get different class
structures.

If an entity doesn't have to be exclusively one subtype, you play with the root of the class hierarchy:

CREATE TABLE Vehicles
(vin CHAR(17) NOT NULL,
vehicle_type CHAR(3) NOT NULL
CHECK(vehicle_type IN ('SUV', 'SED')),
PRIMARY KEY (vin, vehicle_type),
..);

Now start hiding all this stuff in VIEWs immediately and add an INSTEAD OF trigger to those VIEWs.

Jul 23 '05 #5

Mikito Harakiri

"David Cressey" <da***********@earthlink.net> wrote in message
news:7J****************@newsread2.news.atl.earthli nk.net...

Can you trust end users with the management of metadata? I think the answer is no.

To be continued?

Can the end user to be trusted entering primary key attribute?

Can the end user be trusted remembering the password?

Can concurrency control rely on end users being able to commit or rollback
their transactions in time?

Jul 23 '05 #6

Mikito Harakiri

"David Cressey" <da***********@earthlink.net> wrote in message
news:7J****************@newsread2.news.atl.earthli nk.net...

Now, here's the rub: Can you create a new bond type without creating a new tuple type? I think the answer is no.
Can you create a new tuple type without altering metadata? I think the
answer is no.
Can you trust end users with the management of metadata? I think the answer is no.

The usual solution, which you've called "dynamic" is to disguise metadata as data, by storing it in user tables.
That may be the best you are going to do. But it still means that your
users are managing metadata, whether they know it or not.

Seriously, why end-users managing metadata is a bad idea? Is security
compromised? Can users wreck a havoc on the database schema? Even with
properly designed roles and privileges? Also, isn't the line between data
and metadata somewhat artificial?

Jul 23 '05 #7

Neo

> Are there any other that I overlooked?

RM has a scope which works best with data where groups of things have
similar attributes. When modelling data that is highly variable, one
method is generic modelling (ie everything in a few tables, possibly
even one) however it quickly becomes impractical due to the impact on
performance and diminished SQL support. There is no practical solution
for this type of problem using RM. The core problem is a limitation of
the RM data model and to a lesser degree limited processing power.

The script shown below for an experimental db models a few bonds. The
experimental db allows each thing to have a variable number of
classifications, attributes, values and yet things are automatically
normalized by the db. Another such example posted at
www.xdb2.com/example/ex108.asp

// Create items in directory to classify things
(CREATE *bond.item ~in = dir)
(CREATE *yearly rate.item ~in = dir)
(CREATE *amount.item ~in = dir)
(CREATE *principle portion payment.item ~in = dir)
(CREATE *period.item ~in = dir)
(CREATE *date.item ~in = dir)
(CREATE *due date.item ~in = dir)
(CREATE *issue date.item ~in = dir)
(CREATE *start date.item ~in = dir)
(CREATE *end date.item ~in = dir)
(CREATE *put back date.item ~in = dir)
(CREATE *call by issue date.item ~in = dir)

// Create bond A with a yearly rate and an issue date
(CREATE *bondA.cls = bond)
(CREATE bondA.yearly rate = +5%)
(CREATE bondA.issue date = (CREATE *1/1/05.cls = date
_ & it.cls = issue date))

// Create bond B with a yearly rate
// and 2 dates on which a portion of principal is due
(CREATE *bondB.cls = bond)
(CREATE bondB.yearly rate= +6%)
(CREATE bondB.principle portion payment =
_ (CREATE *.cls = principle portion payment
_ & it.amount = +$345
_ & it.due date = (CREATE *1/1/11.cls = date
_ & it.cls = due date)))
(CREATE bondB.principle portion payment =
_ (CREATE *.cls = principle portion payment
_ & it.amount = +$456
_ & it.due date = (CREATE *1/1/12.cls = date
_ & it.cls = due date)))

// Create bond C with different rates in 2 periods,
// one put back date and two call by issue dates.
(CREATE *bondC.cls = bond)
(CREATE bondC.period =
_ (CREATE *.cls = period
_ & it.yearly rate = +8%
_ & it.start date = (CREATE *1/1/05.cls = date
_ & it.cls = start date)
_ & it.end date = (CREATE *1/31/05.cls = date
_ & it.cls = end date)))
(CREATE bondC.period =
_ (CREATE *.cls = period
_ & it.yearly rate = +9%
_ & it.start date = (CREATE *2/1/05.cls = date
_ & it.cls = start date)
_ & it.end date = (CREATE *2/28/05.cls = date
_ & it.cls = end date)))
(CREATE bondC.put back date = (CREATE *1/1/25.cls = date
_ & it.cls = put back date))
(CREATE bondC.call by issue date =
_ (CREATE *7/1/25.cls = date
_ & it.cls = call by issue date))
(CREATE bondC.call by issue date =
_ (CREATE *9/1/25.cls = date
_ & it.cls = call by issue date))

// Find bonds that has a period that earns 9% yearly
// and has a call by issue data of 9/1/25.
// Finds bond C.
(SELECT *.cls = bond
_ & *.period = (*.yearly rate = 9%)
_ & *.call by issue date = 9/1/25)

Jul 23 '05 #8

louis

> 1. Name-Value pairs.

create table bonds (bond_id INTEGER, bond_type INTEGER, attribute_id
INTEGER, value VARCHAR(255))

Comment: The client does not like this approach because they want to
run various kinds of reports and thus they doe not want the values to
be stored as VARCHAR. They want the DB to enforce the datatype.

One possible approach is to use a Variant datatype. In MSSQL, the
"sql_variant" datatype can hold any non-BLOBs. Then add a DATA_TYPE
column where: null is date, 0 is varchar, and 1 is integer. Use a
trigger to ensure that the VALUE can be explicitly casted to the
specified data type.

CREATE TABLE bonds (bond_id INTEGER, bond_type INTEGER, attribute_id
INTEGER, data_type BIT, value SQL_VARIANT)

Jul 23 '05 #9

Carl Federl

I ran into a similar problem with a Rent and Royalties business domain
about 20 years ago.
I found that instead of the typical attribute modeling approach, such
as in Joe Celko's example, define the appropriate specialized
sub-classes based on the algorithms.

Here is a simple example with Loans where the Interest Calculation
method changes over time.

Create table Loans ( Loan_PrimaryKey , StartDate, TermYears , ....)

-- For the Mortgage Interest Algorithms where the InterestMethod is
either Fixed or Variable at any point in time
Loan_Interest ( Loan_PrimaryKey , StartDate , InterestMethod )

-- Here is the sub-type for the Fixed algorithm:
Loan_Interest_Fixed ( Loan_PrimaryKey , StartDate , InterestRate )

-- Here is the sub-type for the Variable algorithm:
Loan_Interest_Variable ( Loan_PrimaryKey , StartDate,
InterestMarginRate, InterestIndex_PrimaryKey )

Similary, there would be algorithm tables for each of the other Loan
algorithm factors such as for Balloon Payments, Payment Frequency,
Insurance, etc. Each of these could have their sub-classes which could
have sub-classes.

These same structures could be used for regardless if a Loan is for a
Home Mortgages, Home Equity or Line of Credit.

Hope this Helps.

Jul 23 '05 #10

louis

Here's some sample MSSQL code for the Variant approach.

CREATE TABLE bonds (bond_id INTEGER, bond_type INTEGER, attribute_id
INTEGER, data_type BIT, value SQL_VARIANT)
GO
CREATE TRIGGER bonds_value_validate
ON dbo.bonds
FOR INSERT, UPDATE
AS
BEGIN
IF EXISTS(
SELECT *
FROM INSERTed
WHERE SQL_VARIANT_PROPERTY(value,'BaseType') <>
CASE data_type WHEN null then 'datetime' WHEN 0 THEN 'varchar' WHEN
1 THEN 'int' END
) BEGIN
ROLLBACK
RAISERROR ('Bad Value', 16, 10)
END
END
GO
INSERT bonds VALUES (1,1,1,null,cast('20050101' as datetime))
GO
INSERT bonds VALUES (1,1,2,0,'abcdef')
GO
INSERT bonds VALUES (1,1,3,1,12345678)
GO
-- this one will throw an error
INSERT bonds VALUES (1,1,3,0,12345678)
GO

SELECT data_type,cast(value as
char(11)),cast(SQL_VARIANT_PROPERTY(value,'BaseTyp e') as char(10)) FROM
bonds

/* output:
NULL Jan 1 2005 datetime
0 abcdef varchar
1 12345678 int
*/

Jul 23 '05 #11

--CELKO--

>> I actually bought and studied back to back your book "Trees and
Hierarchies in SQL" in search of the solution to this problem. The book
is great and is the only one on the market (that I could find) that
deals head-on with the common problem of hierarchy represnetation in
SQL. <<

My house payment thanks you! Moreau and Ben-Gan have a chapter on path
enumeration models in therir book, but I am the only game in town.

I also own your "SQL for smarties book" which is also very good. <<
The third editon will be out this year; I just mailed off the
manuscript to Morgan-Kaufmann last night. I also have a small book on
SQL Programming Style getting typeset now.
This is not a trading system. It is an ANALYSIS tool where the

analyst
can invent new bonds and other instruments and see how they behave and
how they affect the portfolio behaviour .. a computer simulation. <<

That might be your answer; have you thought about getting a simulation
language or tool? There is SLAM, Simula, and probably a dozen other
names I don't remember or never knew.

Jul 23 '05 #12

Erland Sommarskog

(ro*************@yahoo.com) writes:

1. There are many types of bonds, each type has a different set of
attributes, different attribute names, different attribute datatypes.
I don't envy you. Our customer's are mainly focused on stocks trading,
and their volume of bonds are marginal. Nevertheless bonds are by far
the most complex of our instrument groups. Your needs go far beyond
ours, that's for sure.
2. Typed Name-Value pairs.

create table bonds (bond_id INTEGER, bond_type INTEGER, attribute_id
INTEGER, int_val INTEGER, string_val VARCHAR(255), date_val DATE_

Comment: The client does not like this because the table is sparse.
Every row has two empty fields.
Louis had an example. Here is another one, where I use a constraint
instead of a trigger (it's a real-world examples, with some of the
columns deleted for brevity):

CREATE TABLE systemparameters (
sypcode aba_sypcode NOT NULL,
...
typeofdata aba_type NOT NULL
CONSTRAINT ckc_syp_typeofdata
CHECK (typeofdata IN ('B', 'D', 'F', 'I', 'V')),
sypvalue sql_variant NULL
CONSTRAINT ckc_syp_sypvalue
CHECK (sql_variant_property(sypvalue, 'MaxLength') <= 255),
...
CONSTRAINT pk_syp PRIMARY KEY (sypcode),
CONSTRAINT ckt_syp_datatype_sypvalue CHECK
(sql_variant_property(sypvalue, 'basetype') =
CASE typeofdata
WHEN 'B' THEN 'bit'
WHEN 'D' THEN 'datetime'
WHEN 'F' THEN 'float'
WHEN 'I' THEN 'int'
WHEN 'V' THEN 'varchar'
END)
)

I understand from you broad cross-post that you may not be using
MS SQL Server, or at least not only MS SQL Server. In such case
sql_variant is not likely to save the day, since it's not portable.
You could use a varchar column instead of sql_variant, but keep the
typeofdata. You would then have to use a trigger to verify that that
the varchar data conforms with the data, but this can be quite ugly,
not the least for decimal values.
Are the four approaches I described above exhaustive? Are there any
other that I overlooked?

One big table with all attributes known today, and everything nullable
except id and type? Of course, then you would not know which attributes that
are mandatory for which types, unless you were to generate check constraints
dynamically.

No, it does not sound very palatable. Either.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Jul 23 '05 #13

Representation for Heterogeneous Attribute Set

Similar topics