Why Cluster a Primary Key?

Philip Yale

I'm probably going to get shot down with thousands of reasons for
this, but I've never really heard or read a convincing explanation, so
here goes ...

Clustered indexes are more efficient at returning large numbers of
records than non-clustered indexes. Agreed? (Assuming the NC index
doesn't cover the query, of course)

Since it's only possible to have one clustered index, why is this
almost always used for the primary key, when by definition a primary
key will always return 1 record?

Isn't it generally better to specify a non-clustered index for the
primary key, and reserve the clustered index for a column which will
most likely be used for queries that return multi-row data sets (e.g.
date columns)?

Also, if you are using a sequential key, clustering this will cause an
insert hotspot on the last page of the table, which can cause
concurrency problems if you aren't using row-level locking. If you're
using a random clustered key then inserts will generally be improved,
assuming you're using a sensible fillfactor, but you still lose the
advantage of using the clustered index for multi-record retrieval.

I'd be very interested to hear other peoples' views on this.

Phil

Jul 20 '05 #1

Subscribe Post Reply

49780

Steve Jorgensen

The main reason I've found for clustering the primary key is that clustering
anything else will mess up front-end libraries including DAO and ADO, and
sometimes clustering the primary key seems to at least keep records together
that were entered close together in time, and those happen to be the ones
close tegether by date which reduces the number of pages hit in date range
queries.

Personally, I almost always have something I'd rather cluster than the primary
key, but with DAO and ADO both assuming the clustered index is the primary key
even when something else actually is, it's just not workable. Either the
clustered index is unique and much larger than the PK leading to unnecessary
network traffic, or the clustered index is not unique, and the front-end
becomes confused that there seems to be more than one record with the same
key.

On 5 Mar 2004 03:56:38 -0800, ph********@btopenworld.com (Philip Yale) wrote:

I'm probably going to get shot down with thousands of reasons for
this, but I've never really heard or read a convincing explanation, so
here goes ...

Clustered indexes are more efficient at returning large numbers of
records than non-clustered indexes. Agreed? (Assuming the NC index
doesn't cover the query, of course)

Since it's only possible to have one clustered index, why is this
almost always used for the primary key, when by definition a primary
key will always return 1 record?

Isn't it generally better to specify a non-clustered index for the
primary key, and reserve the clustered index for a column which will
most likely be used for queries that return multi-row data sets (e.g.
date columns)?

Also, if you are using a sequential key, clustering this will cause an
insert hotspot on the last page of the table, which can cause
concurrency problems if you aren't using row-level locking. If you're
using a random clustered key then inserts will generally be improved,
assuming you're using a sensible fillfactor, but you still lose the
advantage of using the clustered index for multi-record retrieval.

I'd be very interested to hear other peoples' views on this.

Phil

Jul 20 '05 #2

--CELKO--

>> Since it's only possible to have one clustered index, why is this
almost always used for the primary key, when by definition a primary
key will always return 1 record [sic]? <<

Actually, you hit the nail on the head and did not know it. When SQL
was first implemented, the mental and physical models for data were
based on files (Rows are not records; fields are not columns; tables
are not files). Files with sequential, contigous storage and in
particular, magnetic tape and punch cards (there is no sequential
access or ordering in an RDBMS, so "first", "next" and "last" are
totally meaningless).

A Master mag tape file is sorted on a key, usually at the front of the
records, just after the "deleted" flag. This is so that you can merge
the transaction tapes, also sorted on the same key, into the Master.

Dr. Codd also fell for this and began with the PRIMARY KEY in first
papers on the relational. A bit later, he caught the error and
realized that a relational key is a key is a key and none of them are
"more equal" than the others. Unfortunately, SQL was based on Codd's
first papers and carried the error forward.

Sybase simply used what was there in Unix and the existing file
systems to build SQL Server and Microsoft followed suit.

Are you familiar with the story of how the Roman Empire determined the
size of the Space Shuttle boosters and therefore most of the design of
the shuttle?

Jul 20 '05 #3

Philip Yale

jo*******@northface.edu (--CELKO--) wrote in message news:<a2**************************@posting.google. com>...

Since it's only possible to have one clustered index, why is this

almost always used for the primary key, when by definition a primary
key will always return 1 record [sic]? <<

Actually, you hit the nail on the head and did not know it. When SQL
was first implemented, the mental and physical models for data were
based on files (Rows are not records; fields are not columns; tables
are not files). Files with sequential, contigous storage and in
particular, magnetic tape and punch cards (there is no sequential
access or ordering in an RDBMS, so "first", "next" and "last" are
totally meaningless).

A Master mag tape file is sorted on a key, usually at the front of the
records, just after the "deleted" flag. This is so that you can merge
the transaction tapes, also sorted on the same key, into the Master.

Dr. Codd also fell for this and began with the PRIMARY KEY in first
papers on the relational. A bit later, he caught the error and
realized that a relational key is a key is a key and none of them are
"more equal" than the others. Unfortunately, SQL was based on Codd's
first papers and carried the error forward.

Sybase simply used what was there in Unix and the existing file
systems to build SQL Server and Microsoft followed suit.

Are you familiar with the story of how the Roman Empire determined the
size of the Space Shuttle boosters and therefore most of the design of
the shuttle?

Thanks for that, Celko. It's very interesting, although I must
confess that I'm not sure what it's got to do with my original
question? Whatever the background evolution of RDBMS systems, in the
real world today what people refer to as a "primary key" returns 1
row, and I feel that it's a bit of a waste putting a clustered index
on this.

BTW - I've heard the Roman theory many times, but this really is just
an urban myth. Railway tracks, for example, in the UK, have a gauge
of 4' 8.5" because this was what resulted from a standard axle width
of 5'. There are many other gauges throughout the world, and there's
a very good paper at
http://www.vwl.uni-muenchen.de/ls_komlos/northam.pdf which details
their evolution.

Jul 20 '05 #4

Philip Yale

jo*******@northface.edu (--CELKO--) wrote in message news:<a2**************************@posting.google. com>...

Since it's only possible to have one clustered index, why is this

almost always used for the primary key, when by definition a primary
key will always return 1 record [sic]? <<

Actually, you hit the nail on the head and did not know it. When SQL
was first implemented, the mental and physical models for data were
based on files (Rows are not records; fields are not columns; tables
are not files). Files with sequential, contigous storage and in
particular, magnetic tape and punch cards (there is no sequential
access or ordering in an RDBMS, so "first", "next" and "last" are
totally meaningless).

A Master mag tape file is sorted on a key, usually at the front of the
records, just after the "deleted" flag. This is so that you can merge
the transaction tapes, also sorted on the same key, into the Master.

Dr. Codd also fell for this and began with the PRIMARY KEY in first
papers on the relational. A bit later, he caught the error and
realized that a relational key is a key is a key and none of them are
"more equal" than the others. Unfortunately, SQL was based on Codd's
first papers and carried the error forward.

Sybase simply used what was there in Unix and the existing file
systems to build SQL Server and Microsoft followed suit.

Are you familiar with the story of how the Roman Empire determined the
size of the Space Shuttle boosters and therefore most of the design of
the shuttle?

Sorry, Joe - didn't mean to call you "Celko" in the previous reply; I
instinctively used your sign-on name!

Jul 20 '05 #5

Dan Guzman

> what people refer to as a "primary key" returns 1

row, and I feel that it's a bit of a waste putting a clustered index
on this.
Consider the "Orders" and "Order Details" tables in the sample Northwind
database. The primary keys are respectively OrderID and OrderID/ProductID.
Assuming these tables are frequently joined on OrderID, the Order Details
clustered primary key index reduces i/o and enhances join performance of
these queries.

Of course, there my be a better choice than the primary key for the
clustered index . It all depends on how the data are normally accessed and
there are often trade-offs involved.

--
Hope this helps.

Dan Guzman
SQL Server MVP

"Philip Yale" <ph********@btopenworld.com> wrote in message
news:e9*************************@posting.google.co m... jo*******@northface.edu (--CELKO--) wrote in message

news:<a2**************************@posting.google. com>...

> Since it's only possible to have one clustered index, why is this

almost always used for the primary key, when by definition a primary
key will always return 1 record [sic]? <<

Actually, you hit the nail on the head and did not know it. When SQL
was first implemented, the mental and physical models for data were
based on files (Rows are not records; fields are not columns; tables
are not files). Files with sequential, contigous storage and in
particular, magnetic tape and punch cards (there is no sequential
access or ordering in an RDBMS, so "first", "next" and "last" are
totally meaningless).

A Master mag tape file is sorted on a key, usually at the front of the
records, just after the "deleted" flag. This is so that you can merge
the transaction tapes, also sorted on the same key, into the Master.

Dr. Codd also fell for this and began with the PRIMARY KEY in first
papers on the relational. A bit later, he caught the error and
realized that a relational key is a key is a key and none of them are
"more equal" than the others. Unfortunately, SQL was based on Codd's
first papers and carried the error forward.

Sybase simply used what was there in Unix and the existing file
systems to build SQL Server and Microsoft followed suit.

Are you familiar with the story of how the Roman Empire determined the
size of the Space Shuttle boosters and therefore most of the design of
the shuttle?

Thanks for that, Celko. It's very interesting, although I must
confess that I'm not sure what it's got to do with my original
question? Whatever the background evolution of RDBMS systems, in the
real world today what people refer to as a "primary key" returns 1
row, and I feel that it's a bit of a waste putting a clustered index
on this.

BTW - I've heard the Roman theory many times, but this really is just
an urban myth. Railway tracks, for example, in the UK, have a gauge
of 4' 8.5" because this was what resulted from a standard axle width
of 5'. There are many other gauges throughout the world, and there's
a very good paper at
http://www.vwl.uni-muenchen.de/ls_komlos/northam.pdf which details
their evolution.

Jul 20 '05 #6

Joe Celko

>> .. what people refer to as a "primary key" returns 1
row, and I feel that it's a bit of a waste putting a clustered index on
this. <<

I agree. But this was the default action in the original Sybase product
for the reasons I mentioned and it was carried forward. Programmers are
lazy and don't think; you learn by copying from old code.

Why is "i" used for a loop control variable in procedural langauges?
Because in FORTRAN II, integers began with the letters I thru N.
Wouldn't it be better to come up with a meaning name for the control
within the context of the loop? Sure!

Thanks for the railroad link! Another urban myth bites the dust! Want
to hear the ham bone parable instead :)?

--CELKO--
===========================
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, datatypes, etc. in your
schema are.

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!

Jul 20 '05 #7

Joe Celko

>> Sorry, Joe - didn't mean to call you "Celko" in the previous reply; I
instinctively used your sign-on name! <<

That is what I go by; even my wife calls me "Celko" and my column in
INTELLIGENT ENTERPRISE is called "CELKO". My family was military and I
grew up in an environment where you used the last name. And thanks for
the link!

--CELKO--
===========================
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, datatypes, etc. in your
schema are.

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!

Jul 20 '05 #8

Philip Yale

"Dan Guzman" <da*******@nospam-earthlink.net> wrote in message news:<%i******************@newsread1.news.pas.eart hlink.net>...

what people refer to as a "primary key" returns 1
row, and I feel that it's a bit of a waste putting a clustered index
on this.

Consider the "Orders" and "Order Details" tables in the sample Northwind
database. The primary keys are respectively OrderID and OrderID/ProductID.
Assuming these tables are frequently joined on OrderID, the Order Details
clustered primary key index reduces i/o and enhances join performance of
these queries.

Of course, there my be a better choice than the primary key for the
clustered index . It all depends on how the data are normally accessed and
there are often trade-offs involved.

--
Hope this helps.

Dan Guzman
SQL Server MVP

Thanks Dan.

I quite agree that there are occasions where a clustered primary key
is desirable, and that this is a decision which a DBA should take when
designing the physical database based on the data distribution and
access methods. My contention, though, is that this is often the
exception rather than the rule, contrary to the *default* action taken
when defining a primary key constraint or using a database design
package, both of which tend to assume that all primary keys will be
clustered.

Jul 20 '05 #9

Daniel Morgan

Philip Yale wrote:

My contention, though, is that this is often the
exception rather than the rule, contrary to the *default* action taken
when defining a primary key constraint or using a database design
package, both of which tend to assume that all primary keys will be
clustered.

It wouldn't be if more people paid attention to relational database
theory and Joe Celko rather than have a knee-jerk reaction that every
table needs a surrogate key.

--
Daniel Morgan
http://www.outreach.washington.edu/e...ad/oad_crs.asp
http://www.outreach.washington.edu/e...oa/aoa_crs.asp
da******@x.washington.edu
(replace 'x' with a 'u' to reply)

Jul 20 '05 #10

Erland Sommarskog

Philip Yale (ph********@btopenworld.com) writes:

Clustered indexes are more efficient at returning large numbers of
records than non-clustered indexes. Agreed? (Assuming the NC index
doesn't cover the query, of course)

Since it's only possible to have one clustered index, why is this
almost always used for the primary key, when by definition a primary
key will always return 1 record?

Isn't it generally better to specify a non-clustered index for the
primary key, and reserve the clustered index for a column which will
most likely be used for queries that return multi-row data sets (e.g.
date columns)?

This is an old discussion, and often goes along with "why is the
primary key by default clustered"?

Well, to give my opinion on the last question: I think it is better
if a table has a clustered index than it does not, so if the only
index you have on the table, it should be clustered. And if you only
have one index on the table, in many cases that will be the PK constraint.

As for clustering PKs at all, Dan gave the example that I immediately came
to think of. In the Orders table, OrderID is probably not the best column
to cluster on. Depending on your business, CustomerID or OrderDate are
better choices. But in OrderDetails, what on Earth would you cluster on,
if not the PK?

And this is the failure of people who question why it is good to cluster
the primary key: they forgot that many tables have multiple-column keys,
and range queries on such tables often relates to the top-most columns
in this index.

Daniel Morgan's observation that these tables are not as common they should
be, because some people stick in an artificial identity PK in all their
tables is of course relevant. And that is true, if you live by that design
pattern, *then* you rarely have reason to cluster your PK. Then you should
cluster what should have been your primary key.

--
Erland Sommarskog, SQL Server MVP, so****@algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Jul 20 '05 #11

Erland Sommarskog

Steve Jorgensen (no****@nospam.nospam) writes:

The main reason I've found for clustering the primary key is that
clustering anything else will mess up front-end libraries including DAO
and ADO, and sometimes clustering the primary key seems to at least keep
records together that were entered close together in time, and those
happen to be the ones close tegether by date which reduces the number of
pages hit in date range queries.
But in such case you need a clustered index on the date. If you have a
non-clustered index on date, there will be one bookmark lookup for each
row. Sure, since they all go the same pages, most lookups will be to
cache, but it's still significantly less effecient than a clustered index
seek.
Personally, I almost always have something I'd rather cluster than the
primary key, but with DAO and ADO both assuming the clustered index is
the primary key even when something else actually is, it's just not
workable. Either the clustered index is unique and much larger than the
PK leading to unnecessary network traffic, or the clustered index is not
unique, and the front-end becomes confused that there seems to be more
than one record with the same key.

Huh? I don't know DAO, but that ADO should care about the physical
implementation is news to me. Care to elaborate this? Maybe a repro?

--
Erland Sommarskog, SQL Server MVP, so****@algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Jul 20 '05 #12

Philip Yale

Daniel Morgan <da******@x.washington.edu> wrote in message news:<1078622124.796146@yasure>...

Philip Yale wrote:
My contention, though, is that this is often the
exception rather than the rule, contrary to the *default* action taken
when defining a primary key constraint or using a database design
package, both of which tend to assume that all primary keys will be
clustered.

It wouldn't be if more people paid attention to relational database
theory and Joe Celko rather than have a knee-jerk reaction that every
table needs a surrogate key.

I agree with you to a point. I hope I didn't create the impression
that I was having a "knee-jerk" reaction, and I do take serious issue
with the instruction in almost all SQLServer manuals that "every table
should have a primary key". I wasn't actually referring to surrogate
keys, but to real keys, and I'm sure you'll agree that they are valid
in very many cases due to the way that the SQLServer optimizer (and
Sybase, and I would guess Oracle, too, although I'm no Oracle expert)
is designed and built. All I was questioning was the automatic
tendency to make primary keys (even valid ones) clustered when they
would very often be just as well-served by a NC index, so leaving the
clustered index free for a more suitable key.

Incidentlly, I stumbled across another surrogate-key discussion,
featuring Joe, Dan and others, at
http://www.mcse.ms/message296826.html . Clearly people have very
different and strongly-held views on all this!

Jul 20 '05 #13

Philip Yale

Erland Sommarskog <so****@algonet.se> wrote in message news:<Xn**********************@127.0.0.1>...

Philip Yale (ph********@btopenworld.com) writes:
Clustered indexes are more efficient at returning large numbers of
records than non-clustered indexes. Agreed? (Assuming the NC index
doesn't cover the query, of course)

Since it's only possible to have one clustered index, why is this
almost always used for the primary key, when by definition a primary
key will always return 1 record?

Isn't it generally better to specify a non-clustered index for the
primary key, and reserve the clustered index for a column which will
most likely be used for queries that return multi-row data sets (e.g.
date columns)?
This is an old discussion, and often goes along with "why is the
primary key by default clustered"?

Well, to give my opinion on the last question: I think it is better
if a table has a clustered index than it does not, so if the only
index you have on the table, it should be clustered.
And if you only
have one index on the table, in many cases that will be the PK constraint.

As for clustering PKs at all, Dan gave the example that I immediately came
to think of. In the Orders table, OrderID is probably not the best column
to cluster on. Depending on your business, CustomerID or OrderDate are
better choices. But in OrderDetails, what on Earth would you cluster on,
if not the PK?

I quite agree. In that example, clustering the PK seems fine. The
essential word in my original question was "generally"; I question
whether it isn't generally better to use a NC index unless the
specific criteria of data and usage dictate otherwise.

And this is the failure of people who question why it is good to cluster
the primary key: they forgot that many tables have multiple-column keys,
and range queries on such tables often relates to the top-most columns
in this index.

Fair point. I've just spent some time composing a rebuttal of this,
but now I think I see what you mean:

e.g. unique PK_constraint (DateField datetime not null,
OtherField char(3) not null,
ID int not null)

Something like that? The only comment I'd make is that I was always
taught to make the leading column of any multi-column index the most
restrictive one, but since the overall combination here can be
declared UNIQUE I guess the optimizer will take that into account and
still use the index, even if the datefield itself is very
non-restrictive.
Daniel Morgan's observation that these tables are not as common they should
be, because some people stick in an artificial identity PK in all their
tables is of course relevant. And that is true, if you live by that design
pattern, *then* you rarely have reason to cluster your PK. Then you should
cluster what should have been your primary key.

Jul 20 '05 #14

Erland Sommarskog

Philip Yale (ph********@btopenworld.com) writes:

I quite agree. In that example, clustering the PK seems fine. The
essential word in my original question was "generally"; I question
whether it isn't generally better to use a NC index unless the
specific criteria of data and usage dictate otherwise.
Then again, in many databases most of the tables only have one index and
that is the PK. And these tables does not need any more indexes, because
they are small lookup-tables.

I would like to put it in another way: if you can find another column to
cluster on it, do it. But if you can't think of one, let the PK be
clustered.

But it should certainly be part of the data-modelling phase to identify
good columns to cluster on.
Fair point. I've just spent some time composing a rebuttal of this,
but now I think I see what you mean:

e.g. unique PK_constraint (DateField datetime not null,
OtherField char(3) not null,
ID int not null)

Something like that? The only comment I'd make is that I was always
taught to make the leading column of any multi-column index the most
restrictive one, but since the overall combination here can be
declared UNIQUE I guess the optimizer will take that into account and
still use the index, even if the datefield itself is very
non-restrictive.

I don't really see where you are going here, but generally, if you need
to index on a non-restrictive column, a clustered index is probably better.

Say that you have a table with a status column, and a very common query
is to find all rows with status = 'N' (New). The table would have million
of rows, but normally only a few hundred at a time may have N. A non-
clustered index may work, but if you have no other index to cluster on,
this status column (possibly in combination with some other column) is
a very good candidate. Since the column is always updated during the
life-time of the row, there is a cost for moving the data. But for finding
all newly arrived rows, this is great.

--
Erland Sommarskog, SQL Server MVP, so****@algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Jul 20 '05 #15

Daniel Morgan

Philip Yale wrote:

Daniel Morgan <da******@x.washington.edu> wrote in message news:<1078622124.796146@yasure>...
Philip Yale wrote:

My contention, though, is that this is often the
exception rather than the rule, contrary to the *default* action taken
when defining a primary key constraint or using a database design
package, both of which tend to assume that all primary keys will be
clustered.

It wouldn't be if more people paid attention to relational database
theory and Joe Celko rather than have a knee-jerk reaction that every
table needs a surrogate key.

I agree with you to a point. I hope I didn't create the impression
that I was having a "knee-jerk" reaction, and I do take serious issue
with the instruction in almost all SQLServer manuals that "every table
should have a primary key". I wasn't actually referring to surrogate
keys, but to real keys, and I'm sure you'll agree that they are valid
in very many cases due to the way that the SQLServer optimizer (and
Sybase, and I would guess Oracle, too, although I'm no Oracle expert)
is designed and built. All I was questioning was the automatic
tendency to make primary keys (even valid ones) clustered when they
would very often be just as well-served by a NC index, so leaving the
clustered index free for a more suitable key.

Unless you are one of those people that throws a surrogate key at
everything then the discussion about single column vs. multi-column
primary keys is a discussion about nothing. The natural key to a
record is the natural key whether that is a single column or ten.

One example we deal with repeatedly is the payroll record that often
looks something like this:

employee_id
date
project
task
hours

It takes four, often more, columns to define the natural primary key.
That's just the reality of the data. So do you throw a surrogate at it?
I hope not. A surrogate allows duplicate entries and the only way to
avoid the possibility of duplicates is to create a unique constraint
that duplicates the natural key.

There are places where a surrogate key is the best solution. But
it should be chosen as Option B after considering the implications on
scalability, performance, and data integrity of Option A: A natural
key.

--
Daniel Morgan
http://www.outreach.washington.edu/e...ad/oad_crs.asp
http://www.outreach.washington.edu/e...oa/aoa_crs.asp
da******@x.washington.edu
(replace 'x' with a 'u' to reply)

Jul 20 '05 #16

Steve Jorgensen

Well, it seems I owe y'all an apology. I'm 100% certain that what I said here
was true not very long ago because I had to debug the problems it caused, but
I see that with current versions of SQL Server, MDAC, and DAO, this is no
longer a problem with either DAO nor ADO. I just ran tests both with an MDB
and ADP front-end and could not duplicate the issue.

That means you should ignore my earlier post, and it means I can go back and
add appropriate clustered indexes to all my tables that currently don't have
them and could use them.

On Sun, 7 Mar 2004 11:42:00 +0000 (UTC), Erland Sommarskog <so****@algonet.se>
wrote:

Steve Jorgensen (no****@nospam.nospam) writes:
The main reason I've found for clustering the primary key is that
clustering anything else will mess up front-end libraries including DAO
and ADO, and sometimes clustering the primary key seems to at least keep
records together that were entered close together in time, and those
happen to be the ones close tegether by date which reduces the number of
pages hit in date range queries.

But in such case you need a clustered index on the date. If you have a
non-clustered index on date, there will be one bookmark lookup for each
row. Sure, since they all go the same pages, most lookups will be to
cache, but it's still significantly less effecient than a clustered index
seek.
Personally, I almost always have something I'd rather cluster than the
primary key, but with DAO and ADO both assuming the clustered index is
the primary key even when something else actually is, it's just not
workable. Either the clustered index is unique and much larger than the
PK leading to unnecessary network traffic, or the clustered index is not
unique, and the front-end becomes confused that there seems to be more
than one record with the same key.

Huh? I don't know DAO, but that ADO should care about the physical
implementation is news to me. Care to elaborate this? Maybe a repro?

Jul 20 '05 #17

Erland Sommarskog

Daniel Morgan (da******@x.washington.edu) writes:

One example we deal with repeatedly is the payroll record that often
looks something like this:

employee_id
date
project
task
hours

It takes four, often more, columns to define the natural primary key.
That's just the reality of the data. So do you throw a surrogate at it?
I hope not. A surrogate allows duplicate entries and the only way to
avoid the possibility of duplicates is to create a unique constraint
that duplicates the natural key.

So in my database I have a table for collateral claims which has a four-
column key:

accno - the account.
depno - the depot for the account (most accounts have only 1)
currency - which currency the claim is in
place - the place that holds the claim.

So far so good, but the came the requirement to keep track of the positions
to which the claim related. That calls for two more key columns in the
sub-table:

insid - The instrument which is cause for the claim.
poseffect - Whether the claim is for held or written options, bought/sold
futures etc.

A six-column key looked frightening enough, so added a surrogate key to
the main table (with a UNIQUE constraint on the original PK) and used
this as a FK in the sub-table. But having programmed some against these
table recently, I'm starting to think that was a mistake. Often you
want to manipulate the claim for a certain place. But to do that for
the sub-table, you always to join with the main table, leading to more
complex SQL.

--
Erland Sommarskog, SQL Server MVP, so****@algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Jul 20 '05 #18

Similar topics