What are cons and pros for using IDENTITY property as PK in SQL SERVER 2000?

Andy

Hi All!

We are doing new development for SQL Server 2000 and also moving from
SQL 7.0 to SQL Server 2000.

What are cons and pros for using IDENTITY property as PK in SQL SERVER
2000?
Please, share your experience in using IDENTITY as PK .
Does SCOPE_IDENTITY makes life easier in SQL 2000?

Is there issues with DENTITY property when moving DB from one server
to another? (the same version of SQL Server)

Thank you in advance,
Andy

Jul 20 '05 #1

Subscribe Reply

112

10218

Aaron Bertrand [MVP]

> What are cons and pros for using IDENTITY property as PK in SQL SERVER

2000?
Pros:
- small (4 bytes)
- automatic
- relatively predictable (unlike GUID)
- more usable (try WHERE guidColumn = {AECB...} when debugging a problem)

Cons:
- meaningless identifier (this can also be a good thing)
- can have gaps (after delete or rollback)
- can't be used in some types of replication
- hotspot for insert if it is also clustered index
- not portable

We use it here because our natual keys are much larger than 4 bytes, and
this would be inefficient (especially in indexed foreign key constraints).
Does SCOPE_IDENTITY makes life easier in SQL 2000?
Yes, it is more reliable than @@IDENTITY... but I don't know how it would
make life easier.
Is there issues with DENTITY property when moving DB from one server
to another? (the same version of SQL Server)

Depends on how you define "move," and whether this is one-time or
continuous.

--
Aaron Bertrand
SQL Server MVP
http://www.aspfaq.com/

Jul 20 '05 #2

Daniel Morgan

Aaron Bertrand [MVP] wrote:

What are cons and pros for using IDENTITY property as PK in SQL SERVER
2000?

Pros:
- small (4 bytes)
- automatic
- relatively predictable (unlike GUID)
- more usable (try WHERE guidColumn = {AECB...} when debugging a problem)

Cons:
- meaningless identifier (this can also be a good thing)
- can have gaps (after delete or rollback)
- can't be used in some types of replication
- hotspot for insert if it is also clustered index
- not portable

We use it here because our natual keys are much larger than 4 bytes, and
this would be inefficient (especially in indexed foreign key constraints).

What does 4 bytes have to do with it? If you had said 60 or 100 I'd
understand but why 4?

And please consider Joe Celko's voluminous comments on the subject of
artificial, or surrogate keys, when responding.

Thanks.

--
Daniel Morgan
http://www.outreach.washington.edu/e...ad/oad_crs.asp
http://www.outreach.washington.edu/e...oa/aoa_crs.asp
da******@x.washington.edu
(replace 'x' with a 'u' to reply)

Jul 20 '05 #3

Aaron Bertrand [MVP]

> What does 4 bytes have to do with it? If you had said 60 or 100 I'd

understand but why 4?
Uh, because INT (the most common datatype for IDENTITY) is 4 bytes?
And please consider Joe Celko's voluminous comments on the subject of
artificial, or surrogate keys, when responding.

Joe's a big boy, and he can speak for himself. So can I. We don't all have
to agree on everything. This is why it's called an opinion.

--
Aaron Bertrand
SQL Server MVP
http://www.aspfaq.com/

Jul 20 '05 #4

Bruce Lewis

ne********@hotmail.com (Andy) writes:

What are cons and pros for using IDENTITY property as PK in SQL SERVER
2000? Please, share your experience in using IDENTITY as PK .

My experience says the theorists are right about the dangers of an
artificial primary key. Many real-world database problems stem from
duplicates that would never have been there with a natural primary key.
Natural primary keys also result in reports with fewer joins.

As for the pros of IDENTITY, if you are going to use an artificial
primary key, that's the way to do it. Triggers don't work as well. A
pro for artificial keys in general is that Microsoft products make
compound primary keys inconvenient. Transact-SQL doesn't have tuple
comparisons, e.g. (a, b, c) = (x, y, z), making joins inconvenient.
ASP.NET components that do DataBind() don't handle compound keys at all.
AFAICT, you can only set a KeyColumn parameter to a single column.

I'd personally recommend going with natural primary keys, even if
they're compound.

Jul 20 '05 #5

Louis Davidson

> My experience says the theorists are right about the dangers of an

artificial primary key. Many real-world database problems stem from
duplicates that would never have been there with a natural primary key.
This is only partially true, since what is actually missing is analysis.
Tables can have many keys, and should if there are several reasonable
natural keys, or in this case artificial keys.
Natural primary keys also result in reports with fewer joins.
Only when you make the keys very descriptive. This kind of thing has always
made for an ugly balance of performance (smaller keys, resulting in values
that need 20 characters being condensed into 5) and usability. Joins on
very small values are very fast.
As for the pros of IDENTITY, if you are going to use an artificial
primary key, that's the way to do it. Triggers don't work as well. A
What do you mean by triggers don't work as well? There are ways to create
artificial keys using triggers.
pro for artificial keys in general is that Microsoft products make
compound primary keys inconvenient. Transact-SQL doesn't have tuple
comparisons, e.g. (a, b, c) = (x, y, z), making joins inconvenient.
What SQL syntax has this kind of comparison? I would have figured (a,b,c)
to be a set, not three different columns.
I'd personally recommend going with natural primary keys, even if
they're compound.

There is nothing wrong with that statement, whatsoever. Natural keys have
value, I just like the consistency of the same pattern being used for all
tables.

--
----------------------------------------------------------------------------
-----------
Louis Davidson (dr***@hotmail.com)
Compass Technology Management

Pro SQL Server 2000 Database Design
http://www.apress.com/book/bookDisplay.html?bID=266

Note: Please reply to the newsgroups only unless you are
interested in consulting services. All other replies will be ignored :)

Jul 20 '05 #6

Bob Badour

"Bruce Lewis" <br*****@yahoo.com> wrote in message
news:nm*************@scrubbing-bubbles.mit.edu...

ne********@hotmail.com (Andy) writes:
What are cons and pros for using IDENTITY property as PK in SQL SERVER
2000? Please, share your experience in using IDENTITY as PK .
My experience says the theorists are right ...

.... I'd personally recommend going with natural primary keys, even if
they're compound.

The theorists disagree with you with respect to compound primary keys --
especially in SQL. Candidate keys obviously have as many attributes as they
have, but forming references with compound keys causes severe problems when
information may be missing.

Jul 20 '05 #7

David Portas

> > comparisons, e.g. (a, b, c) = (x, y, z), making joins inconvenient.

What SQL syntax has this kind of comparison? I would have figured (a,b,c)
to be a set, not three different columns.

Row-value comparisons such as this are standard in SQL92 but unfortunately
not supported by SQLServer. Oracle, I believe, does support this feature.

--
David Portas
------------
Please reply only to the newsgroup
--

Jul 20 '05 #8

Daniel Morgan

Bob Badour wrote:

<snipped>

The theorists disagree with you with respect to compound primary keys --
especially in SQL. Candidate keys obviously have as many attributes as they
have, but forming references with compound keys causes severe problems when
information may be missing.

I disagree. Theorists do not disagree at all with respect to compound
primary keys. Primary keys, by definition, don't have missing information.

If you are missing information you have something but that something is
not, by definition, a primary key.
--
Daniel Morgan
http://www.outreach.washington.edu/e...ad/oad_crs.asp
http://www.outreach.washington.edu/e...oa/aoa_crs.asp
da******@x.washington.edu
(replace 'x' with a 'u' to reply)

Jul 20 '05 #9

Bob Badour

"Daniel Morgan" <da******@x.washington.edu> wrote in message
news:1070326393.444066@yasure...

Bob Badour wrote:

<snipped>
The theorists disagree with you with respect to compound primary keys --
especially in SQL. Candidate keys obviously have as many attributes as they have, but forming references with compound keys causes severe problems when information may be missing.

I disagree. Theorists do not disagree at all with respect to compound
primary keys. Primary keys, by definition, don't have missing information.

If you are missing information you have something but that something is
not, by definition, a primary key.

Since when does that have any bearing on missing data in the referencing
table?

Jul 20 '05 #10

Louis Davidson

While it is true that the chosen primary key cannot contain any optional
values, it is more the praticioner (sp?) that disagrees with this stance.
Compound keys are unwieldy and bad for performance, but the theorist in me
says "who cares?" It is all about what is right/best, not what is
fastest/easiest.

--
----------------------------------------------------------------------------
-----------
Louis Davidson (dr***@hotmail.com)
Compass Technology Management

Pro SQL Server 2000 Database Design
http://www.apress.com/book/bookDisplay.html?bID=266

Note: Please reply to the newsgroups only unless you are
interested in consulting services. All other replies will be ignored :)

"Bob Badour" <bb*****@golden.net> wrote in message
news:pf********************@golden.net...

"Bruce Lewis" <br*****@yahoo.com> wrote in message
news:nm*************@scrubbing-bubbles.mit.edu...
ne********@hotmail.com (Andy) writes:
What are cons and pros for using IDENTITY property as PK in SQL SERVER
2000? Please, share your experience in using IDENTITY as PK .
My experience says the theorists are right ...

...
I'd personally recommend going with natural primary keys, even if
they're compound.

The theorists disagree with you with respect to compound primary keys --
especially in SQL. Candidate keys obviously have as many attributes as

they have, but forming references with compound keys causes severe problems when information may be missing.

Jul 20 '05 #11

Bob Badour

"Louis Davidson" <dr***************@hotmail.com> wrote in message
news:#r**************@tk2msftngp13.phx.gbl...

While it is true that the chosen primary key cannot contain any optional
values, it is more the praticioner (sp?) that disagrees with this stance.
Compound keys are unwieldy and bad for performance, but the theorist in me
says "who cares?" It is all about what is right/best, not what is
fastest/easiest.
Performance is determined by the physical structure and not by the logical
interface. Legitimate theorists have written ad nauseum on the severe
logical problems caused by using compound keys for references when data may
be missing.

-------------------------------------------------------------------------- -- -----------
Louis Davidson (dr***@hotmail.com)
Compass Technology Management

Pro SQL Server 2000 Database Design
http://www.apress.com/book/bookDisplay.html?bID=266

Note: Please reply to the newsgroups only unless you are
interested in consulting services. All other replies will be ignored :)

"Bob Badour" <bb*****@golden.net> wrote in message
news:pf********************@golden.net...

"Bruce Lewis" <br*****@yahoo.com> wrote in message
news:nm*************@scrubbing-bubbles.mit.edu...
ne********@hotmail.com (Andy) writes:

> What are cons and pros for using IDENTITY property as PK in SQL SERVER > 2000? Please, share your experience in using IDENTITY as PK .

My experience says the theorists are right ...

...
I'd personally recommend going with natural primary keys, even if
they're compound.

The theorists disagree with you with respect to compound primary keys --
especially in SQL. Candidate keys obviously have as many attributes as

they
have, but forming references with compound keys causes severe problems

when
information may be missing.

Jul 20 '05 #12

Aaron Bertrand - MVP

> "who cares?" It is all about what is right/best, not what is

fastest/easiest.

Unfortunately, what is right/best is not always among the criteria when the
work is for someone else.

--
Aaron Bertrand
SQL Server MVP
http://www.aspfaq.com/

Jul 20 '05 #13

BenignVanilla

"Bruce Lewis" <br*****@yahoo.com> wrote in message
news:nm*************@scrubbing-bubbles.mit.edu...

ne********@hotmail.com (Andy) writes:
What are cons and pros for using IDENTITY property as PK in SQL SERVER
2000? Please, share your experience in using IDENTITY as PK .

My experience says the theorists are right about the dangers of an
artificial primary key. Many real-world database problems stem from
duplicates that would never have been there with a natural primary key.
Natural primary keys also result in reports with fewer joins.

As for the pros of IDENTITY, if you are going to use an artificial
primary key, that's the way to do it. Triggers don't work as well. A
pro for artificial keys in general is that Microsoft products make
compound primary keys inconvenient. Transact-SQL doesn't have tuple
comparisons, e.g. (a, b, c) = (x, y, z), making joins inconvenient.
ASP.NET components that do DataBind() don't handle compound keys at all.
AFAICT, you can only set a KeyColumn parameter to a single column.

I'd personally recommend going with natural primary keys, even if
they're compound.

I am by no means a SQL expert, so forgive me if this seems ignorant...But
why can't the ID columm be a natural key? For example, I am working on a
project that has a vendors table. The list of vendors is used in
relationship to several other tables. We build this table with an identity
column, and a column with the vendor's name. Now when a vendor is added to
the table, they are assigned a unique ID that ties all other related data
back to this vendor, and in the case of a vendor changing their name, or a
typo, we can make updates without affecting data. Seems like a perfect use
for an identity field, and it is our primary key.
--
BV.
WebPorgmaster - www.IHeartMyPond.com
Work at Home, Save the Environment - www.amothersdream.com

Jul 20 '05 #14

Aaron Bertrand - MVP

> I am by no means a SQL expert, so forgive me if this seems ignorant...But

why can't the ID columm be a natural key? For example, I am working on a
project that has a vendors table. The list of vendors is used in
relationship to several other tables. We build this table with an identity

An identity value that is generated by the system is not "natural"... a
natural key means that the key is, by nature, identifying a single row...
not artificially because you generated some value for it. A natural key
could be an e-mail address, or a social security number, or a license plate
number, or a latitude and longitude -- something that is part of the data
that also happens to uniquely identify it.

Keep in mind that a primary key does not have to a natural key. I don't see
any problems with using an IDENTITY as a primary key, such as in your case.
But in your case it is not a natural key.

--
Aaron Bertrand
SQL Server MVP
http://www.aspfaq.com/

Jul 20 '05 #15

David Portas

IDENTITY may be a *surrogate* key but it isn't a *natural* key because it
bears no relation to the entity that you are modelling in your table. A
natural primary key is a subset of the attributes of an entity which
uniquely identify that entity. IDENTITY clearly isn't an attribute of any
real entity - it's just an arbitrary number.

Taking your Vendors table as an example, a naive design might look like
this:

CREATE TABLE Vendors (vendor_id INTEGER IDENTITY PRIMARY KEY /* ?? */,
vendor_name VARCHAR(40) NOT NULL, vendor_tax_id VARCHAR(10) NOT NULL, ...)

But this table has no uniqueness or integrity because multiple vendors can
exist with different (arbitrary) vendor_ids. It may well work internally for
a particular application but will break when someone needs to do some real
analysis on your data. And what if you need to combine it with data from
another system that doesn't have that same magical Vendor_id column?

Here's a better alternative:

CREATE TABLE Vendors (vendor_id INTEGER NOT NULL UNIQUE /* surrogate */,
vendor_name VARCHAR(40) NOT NULL UNIQUE, vendor_tax_id VARCHAR(10) NOT NULL
PRIMARY KEY, ...)

By declaring UNIQUE / PK constraints on the correct attributes you can
ensure that you have verifiably unique data. Keep the surrogate key if you
like but make sure you declare the Natural key as well. (Moving the actual
PK declaration is essentially cosmetic - PK is equivalent to NOT NULL UNIQUE
and it's not unusual to have several NOT NULL UNIQUE keys in a table).

Here's Celko on keys:

http://www.intelligententerprise.com...celko1_1.shtml

--
David Portas
------------
Please reply only to the newsgroup
--

Jul 20 '05 #16

Steve Kass

Aaron Bertrand - MVP wrote:

I am by no means a SQL expert, so forgive me if this seems ignorant...But
why can't the ID columm be a natural key? For example, I am working on a
project that has a vendors table. The list of vendors is used in
relationship to several other tables. We build this table with an identity
An identity value that is generated by the system is not "natural"... a
natural key means that the key is, by nature, identifying a single row...
not artificially because you generated some value for it. A natural key
could be an e-mail address, or a social security number, or a license plate
number, or a latitude and longitude -- something that is part of the data
that also happens to uniquely identify it.

Quiz: Classify each key below as "natural" or "artificial":

Northwind..Customers.CustomerID
Northwind..Orders.OrderID
Northwind..Territories.TerritoryID

Most keys are fundamentally artificial, but somehow we only call them
artificial if we made them up, not if someone else made them up. Social
Security numbers are probably nothing different than identity values in
someone elses database, which doesn't make them any more intrinsic to
living people than VendorID values. I don't think an identity VendorID
value is any more artificial than any other unique way of identifying a
vendor, so long as it is assigned the first time a vendor enters the
system and is never changed.

I think this whole natural/artifical distinction is mostly quite silly.
In a well-designed database, entities can just as easily be identified
by an integer they are assigned when they enter the system as they can
by some set of attributes within the data they enter the system with.
The arguments against using identity values always seem to be arguments
against the ways people misuse identity values.

The advantage of keys like Northwind..Customers.CustomerID is really an
error-correction issue. If carefully chosen, those 5-character keys can
be recovered if a single letter is mistyped. But this can be done with
artificial values also, if check digits or other error-correcting
schemes are used.

SK

Keep in mind that a primary key does not have to a natural key. I don't see
any problems with using an IDENTITY as a primary key, such as in your case.
But in your case it is not a natural key.

Jul 20 '05 #17

David Portas

> living people than VendorID values. I don't think an identity VendorID

value is any more artificial than any other unique way of identifying a
vendor, so long as it is assigned the first time a vendor enters the
system and is never changed.

But the point is that a "natural" key is verifiable outside of the system.

When I see a NG post from "skass[at]drew.edu" I don't care whether that's
based on your "real" name or even whether S.Kass is the same name as on your
passport or driver's licence. What's important to me is that it's determined
by a consistent method outside of the system which gives me some acceptable
degree of confidence that you're the same person who posted here as
"skass[at]drew.edu" yesterday. Of course that validity is destroyed if you
change your email address or if someone spoofs your address. But it's still
intrinsically better than an arbitrary ID allocated by the server.

--
David Portas
------------
Please reply only to the newsgroup
--

Jul 20 '05 #18

Aaron Bertrand - MVP

> artificial if we made them up, not if someone else made them up. Social

Security numbers are probably nothing different than identity values in
someone elses database,

Right, but this is a centralized and controlled database. Nobody else has
my SSN (though there are exceptions, e.g. someday SSNs for deceased people
will have to be re-used); anybody else who uses it to identify themselves is
likely attempting fraud / identity theft. I consider it a "natural" key
because I supply it to the database, rather than the other way around.

Whereas my customerID according to Barnes & Noble is very unlikely to be the
same as my customerID at J.Crew.

In any case, I do agree that the distinction is largely silly, especially
when it erupts into arguments and "but Celko says..." nonsense. Like many
other things in the database world, the choice of a key is not dictated by
some higher power, but is rather situation-dependent.

--
Aaron Bertrand
SQL Server MVP
http://www.aspfaq.com/

Jul 20 '05 #19

Bob Badour

"David Portas" <RE****************************@acm.org> wrote in message
news:e-********************@giganews.com...

living people than VendorID values. I don't think an identity VendorID
value is any more artificial than any other unique way of identifying a
vendor, so long as it is assigned the first time a vendor enters the
system and is never changed.

But the point is that a "natural" key is verifiable outside of the system.

As soon as one records the generated identity value outside the system, the
key is verifiable outside of the system. All keys are surrogates or
artificial keys. Natural keys are nothing more than familiar surrogates.

Jul 20 '05 #20

Steve Kass

David Portas wrote:

living people than VendorID values. I don't think an identity VendorID
value is any more artificial than any other unique way of identifying a
vendor, so long as it is assigned the first time a vendor enters the
system and is never changed.
But the point is that a "natural" key is verifiable outside of the system.

I guess it depends on where you draw the boundaries of "the system." If
you need something outside of "the system" to verify your key, aren't
you just working within a larger system (and one that is not entirely
under your control)?

When I see a NG post from "skass[at]drew.edu" I don't care whether that's
based on your "real" name or even whether S.Kass is the same name as on your
passport or driver's licence. What's important to me is that it's determined
by a consistent method outside of the system which gives me some acceptable
degree of confidence that you're the same person who posted here as
"skass[at]drew.edu" yesterday. Of course that validity is destroyed if you
change your email address or if someone spoofs your address. But it's still
intrinsically better than an arbitrary ID allocated by the server.
This is a good point. If entities enter your system from time to time,
and you must determine whether they duplicate entities already in your
system or are new, then you need some "natural" method of
identification. If I apply for a Microsoft credit card, Microsoft will
likely generate an artifical credit card number for me. But that can't
be the only way in which Microsoft can identify me if a business rule
prohibits one person from possessing two Microsoft credit cards

A nice way to look at this is by recognizing when there is and when
there isn't some external entity that participates in an internal
business rule. The need to use a "natural" key such as my social
security number exists only because there is an entity outside
Microsoft's credit department (the
person-registered-with-the-social-security-administration entity) that
participates in a business rule: that the cardinality of the PRWTSSA <->
CCP relation is required to be 1<->{0,1}.

This is a useful way to look at things. Is an identity value VendorID a
good key for a vendor? It depends on whether there is some entity
"outside the system" with a certain relationship to the entity
identified by VendorID. If there is, then VendorID is not suitable as
the only key. If it is used, it must be a surrogate for a natural
external key also recorded within the system. Some businesses may allow
one corporation more than one VendorID, and others may not.

Is an identity value InvoiceID a good key for invoices (in the database
of the business generating the invoices)? It might well be, since
invoices can be internal to the system. While there might be a more
"natural" key, such as (CreationDatetime, IssuingEmployee_or_System),
the natural key might not provide any added value beyond its individual
attribute values if invoices are entirely internal.

In some cases, what appear to be external entities can use internal
(artificial) keys, because there is no business rule relating the
internal and external entities. An example might be a deli counter
customer who pulls an identity value from the "take a ticket" machine.
Here the only confusion is that there is no handy word for the
human-visit-to-deli-counter entity, and Customer might be a more
convenient name. Despite the fact that customers are people and people
can be identified uniquely outside the system, there's no need to worry
about that to manage visits to the deli counter.

I'm curious now to know whether this point of view helps make a little
more sense of the big debate. If not that, maybe it will at least help
me understand why I get anxious every time I need to use a Microsoft
Passport for identification.

SK

Jul 20 '05 #21

Louis Davidson

The problem is with how it is used. If you start giving users access to
identity based values, you get into a bad spot where they want to make
changes to the value (in my line of work, we don't like the numbers 666 in
account numbers) so using identities for user values is a bad idea. I use
them only for internal pointers that are never presented to users, since
they are not modifiable. I could use guids, or characters, or whatever for
keys and no one would be the wiser.

Natural keys are nothing more than familiar surrogates.
You are kind of right here, but it is generally true that natural keys can
change, because in the world, things can change. Identities cannot change.

--
----------------------------------------------------------------------------
-----------
Louis Davidson (dr***@hotmail.com)
Compass Technology Management

Pro SQL Server 2000 Database Design
http://www.apress.com/book/bookDisplay.html?bID=266

Note: Please reply to the newsgroups only unless you are
interested in consulting services. All other replies will be ignored :)

"Bob Badour" <bb*****@golden.net> wrote in message
news:IO********************@golden.net... "David Portas" <RE****************************@acm.org> wrote in message
news:e-********************@giganews.com...
living people than VendorID values. I don't think an identity VendorID value is any more artificial than any other unique way of identifying a vendor, so long as it is assigned the first time a vendor enters the
system and is never changed.
But the point is that a "natural" key is verifiable outside of the

system.
As soon as one records the generated identity value outside the system, the key is verifiable outside of the system. All keys are surrogates or
artificial keys. Natural keys are nothing more than familiar surrogates.

Jul 20 '05 #22

Trey Walpole

"David Portas" <RE****************************@acm.org> wrote in message
news:e-********************@giganews.com...

living people than VendorID values. I don't think an identity VendorID
value is any more artificial than any other unique way of identifying a
vendor, so long as it is assigned the first time a vendor enters the
system and is never changed.
But the point is that a "natural" key is verifiable outside of the system.

When I see a NG post from "skass[at]drew.edu" I don't care whether that's
based on your "real" name or even whether S.Kass is the same name as on

your passport or driver's licence. What's important to me is that it's determined by a consistent method outside of the system which gives me some acceptable degree of confidence that you're the same person who posted here as
"skass[at]drew.edu" yesterday. Of course that validity is destroyed if you
change your email address or if someone spoofs your address. But it's still intrinsically better than an arbitrary ID allocated by the server.

I am a firm believer that natural keys should only be used to logically
design/normalize the data. When it comes to the real reason for keys, data
integrity, more often than not I have seen that natural keys are
intrinsically not good physical primary keys.

1. Natural keys are, being natural and therefore user entered [i.e.,
provided to the database by external means], fungible. If a user enters
data, they must also be able to modify it. If data can be modified, then its
value as a systemic primary key is gone. Yes, you can cascade updates to
these, but why do it when it can be avoided to start with.

2. Natural keys are typically a composite of atomic attributes. If using a
composite, these must be propagated to referencing tables as foreign keys.
Your normalization drops below par, by having these [potentially] massively
duplicated columns.
Attributes that are single, [supposedly] unique attributes (e.g., SSN),
usually represent some official, governmentally recognized ID, and therefore
have legal issues with being propagated throughout a system.

Also, for amateurs and many professionals, natural keys are very often
chosen incorrectly. e.g., I believe some combination of Name and other info
has been used by my ISP as their primary key. My last name was entered into
their system incorrectly, but they cannot fix it because their system will
not allow it. Preposterous and poor design.

Surrogate keys generated by using the identity property are ideal for data
integrity, because
1. They are static values [i.e., once entered, it does not change] and the
DBA has control over allowing values in identity columns to be modified.
2. They are singleton row ids. The fact that they are sequential is
irrelevant. That is simply the most efficient means of generating new
numeric values.

Identity integers can be problematic in two-way replication, but proper
management of key ranges can alleviate these issues. GUIDs are the MS
recommended way to deal with distributed data and two-way replication, but
not as easy to deal with in unreplicated databases.

Jul 20 '05 #23

Trevor Best

On Tue, 2 Dec 2003 09:04:35 -0500 in comp.databases, "Bob Badour"
<bb*****@golden.net> wrote:

"Louis Davidson" <dr***************@hotmail.com> wrote in message
news:#r**************@tk2msftngp13.phx.gbl...
While it is true that the chosen primary key cannot contain any optional
values, it is more the praticioner (sp?) that disagrees with this stance.
Compound keys are unwieldy and bad for performance, but the theorist in me
says "who cares?" It is all about what is right/best, not what is
fastest/easiest.

Performance is determined by the physical structure and not by the logical
interface. Legitimate theorists have written ad nauseum on the severe
logical problems caused by using compound keys for references when data may
be missing.

Are you talking about data being missing from a foreign key side of a
relationship? I'm pro identity column myself but I don't see how that
would help in this instance.

--
A)bort, R)etry, I)nfluence with large hammer.

Jul 20 '05 #24

Trevor Best

On Tue, 2 Dec 2003 17:53:44 -0000 in comp.databases, "David Portas"
<RE****************************@acm.org> wrote:

living people than VendorID values. I don't think an identity VendorID
value is any more artificial than any other unique way of identifying a
vendor, so long as it is assigned the first time a vendor enters the
system and is never changed.

But the point is that a "natural" key is verifiable outside of the system.

When I see a NG post from "skass[at]drew.edu" I don't care whether that's
based on your "real" name or even whether S.Kass is the same name as on your
passport or driver's licence. What's important to me is that it's determined
by a consistent method outside of the system which gives me some acceptable
degree of confidence that you're the same person who posted here as
"skass[at]drew.edu" yesterday. Of course that validity is destroyed if you
change your email address or if someone spoofs your address. But it's still
intrinsically better than an arbitrary ID allocated by the server.

If Steve leaves that educational facility he's at now, his email
address will surely change. If he goes into a witness protection
scheme his name, address and SS number will change, he or someone else
could change quiet a bit about him but if he's on a database with an
identity column as his PK then it's more likely that it will *never*
change.

--
A)bort, R)etry, I)nfluence with large hammer.

Jul 20 '05 #25

Steve Kass

Trevor Best wrote:

On Tue, 2 Dec 2003 17:53:44 -0000 in comp.databases, "David Portas"
<RE****************************@acm.org> wrote:

living people than VendorID values. I don't think an identity VendorID
value is any more artificial than any other unique way of identifying a
vendor, so long as it is assigned the first time a vendor enters the
system and is never changed.

But the point is that a "natural" key is verifiable outside of the system.

When I see a NG post from "skass[at]drew.edu" I don't care whether that's
based on your "real" name or even whether S.Kass is the same name as on your
passport or driver's licence. What's important to me is that it's determined
by a consistent method outside of the system which gives me some acceptable
degree of confidence that you're the same person who posted here as
"skass[at]drew.edu" yesterday. Of course that validity is destroyed if you
change your email address or if someone spoofs your address. But it's still
intrinsically better than an arbitrary ID allocated by the server.

If Steve leaves that educational facility he's at now, his email
address will surely change. If he goes into a witness protection
scheme his name, address and SS number will change, he or someone else
could change quiet a bit about him but if he's on a database with an
identity column as his PK then it's more likely that it will *never*
change.

In fact none of the 257 different identity values assigned to me will ever
get changed - unless I'm organized and honest, and unless someone
checks some kind of natural key of mine, how will anyone know
it was me each of those 257 times I opened an account?

SK

Jul 20 '05 #26

Greg D. Moore \(Strider\)

"BenignVanilla" <bv@tibetanbeefgarden.com> wrote in message
news:O5********************@giganews.com...

I am by no means a SQL expert, so forgive me if this seems ignorant...But
why can't the ID columm be a natural key? For example, I am working on a
project that has a vendors table. The list of vendors is used in
relationship to several other tables. We build this table with an identity
column, and a column with the vendor's name. Now when a vendor is added to
the table, they are assigned a unique ID that ties all other related data
back to this vendor, and in the case of a vendor changing their name, or a
typo, we can make updates without affecting data. Seems like a perfect use
for an identity field, and it is our primary key.
And what happens if someone does a DBCC checkident ('FOO', RESEED)?

Or you have to copy it into a new table and accidently set that table up
with an IDENTITY column and now all your rows get new IDs?

--
BV.
WebPorgmaster - www.IHeartMyPond.com
Work at Home, Save the Environment - www.amothersdream.com

Jul 20 '05 #27

Trey Walpole

> And what happens if someone does a DBCC checkident ('FOO', RESEED)?

Or you have to copy it into a new table and accidently set that table up
with an IDENTITY column and now all your rows get new IDs?

Fire the DBA. If they've allowed unskilled people such access or do anything
accidentally, no telling what other problems they'll cause ;)

Jul 20 '05 #28

Greg D. Moore \(Strider\)

"Trey Walpole" <tr********@SPcomcastAM.net> wrote in message
news:uP**************@TK2MSFTNGP12.phx.gbl...

And what happens if someone does a DBCC checkident ('FOO', RESEED)?

Or you have to copy it into a new table and accidently set that table up
with an IDENTITY column and now all your rows get new IDs?
Fire the DBA. If they've allowed unskilled people such access or do

anything accidentally, no telling what other problems they'll cause ;)
Wow. Can I get a job where you work where folks never make mistakes?

Seriously, a DBCC checkident can be necessary in some recovery scenarios.

Copying a table over into another one is often necessary in general
maintenance, schema changes, etc. It's pretty easy to forget to do it
right.

Admittedly, they are pretty contrived examples, but the point is, the value
of the identity relies on some arbitrary state of the DB at the point in
time it is created.

Now, in some cases that just might not matter, but in many cases it can be
an important factor.

Jul 20 '05 #29

Stijn Verrept

I've read through this thread but I don't understand it. I always use an
int or smallint as primary key, with identity. I believe it would be a mess
otherwise.

Example: I have a table with people, last name, first name, address, ... So
suppose you would make a natural key then you need at least the last name
and the first name. I have >25 other tables that reference that table. If
I get this right I will need to use the name and firstname field in all the
other tables as well to reference. Isn't that just a lot of data waste? If
I'm missing something, please tell me what because this seems a bit silly.
Stijn Verrept.

Jul 20 '05 #30

Trey Walpole

"Greg D. Moore (Strider)" <mo*****@greenms.com> wrote in message
news:IL********************@twister.nyroc.rr.com.. .

"Trey Walpole" <tr********@SPcomcastAM.net> wrote in message
news:uP**************@TK2MSFTNGP12.phx.gbl...
And what happens if someone does a DBCC checkident ('FOO', RESEED)?

Or you have to copy it into a new table and accidently set that table up with an IDENTITY column and now all your rows get new IDs?
Fire the DBA. If they've allowed unskilled people such access or do

anything
accidentally, no telling what other problems they'll cause ;)

Wow. Can I get a job where you work where folks never make mistakes?

Seriously, a DBCC checkident can be necessary in some recovery scenarios.

Copying a table over into another one is often necessary in general
maintenance, schema changes, etc. It's pretty easy to forget to do it
right.

I was, of course, being facetious -- well, mostly. :*) Mistakes do happen,
but in most situations in db management, like these mentioned, they are
completely avoidable.

The DBA should be the one doing the DBCC CHECKIDENT - not just "someone".
And if it is an approveed someone else, the DBA better know about it.

Copying a table's data over is often necessary, but you do need to be very
careful and know all the things that might be affected. Again, since this is
a DBA responsibility, he'd better know how to do it right and have some test
db to work with.
[And if that scenario happens, it's pretty easy to fix, although it does
mean moving data all over again.]
Admittedly, they are pretty contrived examples, but the point is, the value of the identity relies on some arbitrary state of the DB at the point in
time it is created.

Now, in some cases that just might not matter, but in many cases it can be
an important factor.

Also very true. But it is the DBA that has [or should have] the control over
any modifications that affect identity values, whereas any user has the
ability to change natural keys [which was the point I was making :)]

Jul 20 '05 #31

Greg D. Moore \(Strider\)

"Trey Walpole" <tr********@SPcomcastAM.net> wrote in message
news:uN**************@TK2MSFTNGP10.phx.gbl...

"Greg D. Moore (Strider)" <mo*****@greenms.com> wrote in message
news:IL********************@twister.nyroc.rr.com.. .

I was, of course, being facetious -- well, mostly. :*) Mistakes do happen,
but in most situations in db management, like these mentioned, they are
completely avoidable.

Oh I know. :-)

The DBA should be the one doing the DBCC CHECKIDENT - not just "someone".
And if it is an approveed someone else, the DBA better know about it.
Keep in mind not all companies have that level of experience. I've
consulted for a few.

Copying a table's data over is often necessary, but you do need to be very
careful and know all the things that might be affected. Again, since this is a DBA responsibility, he'd better know how to do it right and have some test db to work with.
[And if that scenario happens, it's pretty easy to fix, although it does
mean moving data all over again.]
Actually it's impossible to fix if you've deleted the original table since
you no longer have the original ID numbers.

Admittedly, they are pretty contrived examples, but the point is, the value
of the identity relies on some arbitrary state of the DB at the point in
time it is created.

Now, in some cases that just might not matter, but in many cases it can be an important factor.

Also very true. But it is the DBA that has [or should have] the control

over any modifications that affect identity values, whereas any user has the
ability to change natural keys [which was the point I was making :)]

In an ideal situation, I'd agree.

Jul 20 '05 #32

Aaron Bertrand [MVP]

You have definitely hit a couple of the strong points in IDENTITY's favor.
But there are cases where a "natural" kind of key can be small as well.
Consider when eBay bought PayPal (and other than that, the rest of this
paragraph is completely fictional), they likely had to merge some data...
perhaps eBay used an IDENTITY to generate customer numbers, but they want to
align those primary keys with the new data in the PayPal tables. So, the
keys in the PayPal data become INTs, but not IDENTITY. They are kind of
"natural" because they came to the PayPal from an external source, so to
speak, rather than generated arbitrarily from within.

Of course, completely fictional. But surely you can see that not all
natural keys are going to be larger than an IDENTITY, or less efficient.
There are other examples, too. In a small stats system, a SMALLDATETIME
could be the primary key (perhaps several subrelated tables are organized by
day). In fact, part of http://www.aspfaq.com/stats.asp (and plenty more
that you can't see) is derived on a set of tables where SMALLDATETIME is the
only key of relevance. Okay, so that's still 4 bytes, but you save 4 if
your other alternative is to store an IDENTITY along with the SMALLDATETIME
value. Consider:

CREATE TABLE calendar
(
dateValue SMALLDATETIME PRIMARY KEY
)

vs.

CREATE TABLE calendar
(
dateID INT IDENTITY PRIMARY KEY,
dateValue SMALLDATETIME NOT NULL
)

Never mind my goofy naming scheme. :-)

Now, Kass could probably show me some cool dateadd tricks that would allow
me to store just an INT (or maybe even a SMALLINT, depending on the date
range required), and determine what the date value is at runtime. Not that
I think that's what his argument would be, but rather just to show that it
is still possible to choose either route. I think the usability of the date
value representing what it is, rather than having to derive its value from
some formula, is a good thing.

In cases like e-mail address and SSN (and in fact most cases), I still
prefer your route, where there is a surrogate key (IDENTITY) that prevents
me from having to cascade changes all over the place, and store larger
foreign keys.

Firstname + lastname is obviously a bad choice for a key of any kind,
because I know more than one Aaron Bertrand. So then you bring middle name
into the key, and it can still be repeated. Other things like getting
adopted, re-married, legally changing their name, and other reasons why this
"key" would change are minor; changes to the key can be dealt with in the
database using DRI/CASCADE or, worst case scenario, through rigorous update
code; it will be tougher to re-train users to look up all the tables
containing "Carmen Bertrand" instead of "Carmen Electra." :-) However, I
think the possibility of two people having the same key is a far more
compelling argument for bypassing the natural key and placing some
meaningless identifier, like IDENTITY, that the user doesn't care about and
would never have to change.

Now, you might think, "why not bring SSN into the FirstName + MiddleName +
LastName key? That would make it unique." Yes, and hideously large. If
SSN is unique, then why not just use SSN as the key? Again, it's large even
on its own (CHAR(9) and surely to become CHAR(10) in our lifetimes), so I
fail to see the benefit of repeating the value in every related table, DRI
or not.

Sorry about the earful, sometimes I get a little typographical diarrhea.
Hopefully that was at least marginally intelligible.

--
Aaron Bertrand
SQL Server MVP
http://www.aspfaq.com/

"Stijn Verrept" <sv******@nospan.vub.ac.be> wrote in message
news:#G**************@TK2MSFTNGP10.phx.gbl...

I've read through this thread but I don't understand it. I always use an
int or smallint as primary key, with identity. I believe it would be a mess otherwise.

Example: I have a table with people, last name, first name, address, ... So suppose you would make a natural key then you need at least the last name
and the first name. I have >25 other tables that reference that table. If I get this right I will need to use the name and firstname field in all the other tables as well to reference. Isn't that just a lot of data waste? If I'm missing something, please tell me what because this seems a bit silly.
Stijn Verrept.

Jul 20 '05 #33

Aaron Bertrand [MVP]

> Actually it's impossible to fix if you've deleted the original table since

you no longer have the original ID numbers.

Hopefully you're not starting a DTS task, never mind dropping a table before
validating a successful transfer, without a decent backup in place. I think
this is the kind of thing that Trey means when he uses the term
"avoidable"... I'll stretch it here to also mean "correctable."

Jul 20 '05 #34

Louis Davidson

"Steve Kass" <sk***@drew.edu> wrote in message
news:3F**************@drew.edu...

In fact none of the 257 different identity values assigned to me will ever
get changed - unless I'm organized and honest, and unless someone
checks some kind of natural key of mine, how will anyone know
it was me each of those 257 times I opened an account?

Those values were not assigned to you, they were assigned to 257 different
accounts. Those 257 accounts will also have 257 natural keys most likely,
which will be an account number, likely taken from a paper form in the bank
office.

If you are talking about you giving 257 different identities (you tell them
you are different people) then the fake people are getting new identity
values, and pretty soon they will get a new artificial key printed on a
fashionable orange jumpsuit :)

--
----------------------------------------------------------------------------
-----------
Louis Davidson (dr***@hotmail.com)
Compass Technology Management

Pro SQL Server 2000 Database Design
http://www.apress.com/book/bookDisplay.html?bID=266

Note: Please reply to the newsgroups only unless you are
interested in consulting services. All other replies will be ignored :)

Jul 20 '05 #35

Stijn Verrept

> In cases like e-mail address and SSN (and in fact most cases), I still

prefer your route, where there is a surrogate key (IDENTITY) that prevents
me from having to cascade changes all over the place, and store larger
foreign keys.
Ok I agree that sometimes natural keys are good. But mind the word:
sometimes :). After reading through this thread I got the impression that
natural keys are used more than surrogate keys, while in practice I use
identity with surrogate keys almost all the time.
Now, you might think, "why not bring SSN into the FirstName + MiddleName +
LastName key? That would make it unique." Yes, and hideously large. If
SSN is unique, then why not just use SSN as the key? Again, it's large even on its own (CHAR(9) and surely to become CHAR(10) in our lifetimes), so I
fail to see the benefit of repeating the value in every related table, DRI
or not.
Indeed, and I don't know about American legislation but maybe in the future
(or even now) you can have people work for you who don't have a SSN (who
work from a distance country for example) and then you'll get stuck again.
Sorry about the earful, sometimes I get a little typographical diarrhea.
Hopefully that was at least marginally intelligible.

It was :)
Stijn Verrept.

Jul 20 '05 #36

Aaron Bertrand [MVP]

> sometimes :). After reading through this thread I got the impression that

natural keys are used more than surrogate keys, while in practice I use
identity with surrogate keys almost all the time.

I don't get that impression at all, and I don't believe it is true. Maybe
that's what purists would *like* however...

--
Aaron Bertrand
SQL Server MVP
http://www.aspfaq.com/

Jul 20 '05 #37

Bob Badour

"Louis Davidson" <dr***************@hotmail.com> wrote in message
news:ex**************@tk2msftngp13.phx.gbl...

The problem is with how it is used. If you start giving users access to
identity based values, you get into a bad spot where they want to make
changes to the value (in my line of work, we don't like the numbers 666 in
account numbers) so using identities for user values is a bad idea. I use
them only for internal pointers that are never presented to users, since
they are not modifiable. I could use guids, or characters, or whatever for keys and no one would be the wiser.
Keys are logical identifiers. They identify data for the user as well as for
the dbms. Preventing the user from seeing the identifier is just stupid.

Natural keys are nothing more than familiar surrogates.

You are kind of right here, but it is generally true that natural keys can
change, because in the world, things can change. Identities cannot

change.

It is not a "kind of" rightness. The statement is obviously and
self-evidently right to anyone with at least a minimal education in the
fundamentals of data management.

--
-------------------------------------------------------------------------- -- -----------
Louis Davidson (dr***@hotmail.com)
Compass Technology Management

Pro SQL Server 2000 Database Design
http://www.apress.com/book/bookDisplay.html?bID=266

Note: Please reply to the newsgroups only unless you are
interested in consulting services. All other replies will be ignored :)

"Bob Badour" <bb*****@golden.net> wrote in message
news:IO********************@golden.net...
"David Portas" <RE****************************@acm.org> wrote in message
news:e-********************@giganews.com...
> living people than VendorID values. I don't think an identity VendorID > value is any more artificial than any other unique way of
identifying
a > vendor, so long as it is assigned the first time a vendor enters the
> system and is never changed.

But the point is that a "natural" key is verifiable outside of the

system.

As soon as one records the generated identity value outside the system,

the
key is verifiable outside of the system. All keys are surrogates or
artificial keys. Natural keys are nothing more than familiar surrogates.

Jul 20 '05 #38

Bob Badour

"Trey Walpole" <tr********@SPcomcastAM.net> wrote in message
news:#G**************@TK2MSFTNGP09.phx.gbl...

"David Portas" <RE****************************@acm.org> wrote in message
news:e-********************@giganews.com...
living people than VendorID values. I don't think an identity VendorID value is any more artificial than any other unique way of identifying a vendor, so long as it is assigned the first time a vendor enters the
system and is never changed.

But the point is that a "natural" key is verifiable outside of the system.
When I see a NG post from "skass[at]drew.edu" I don't care whether that's based on your "real" name or even whether S.Kass is the same name as on

your
passport or driver's licence. What's important to me is that it's

determined
by a consistent method outside of the system which gives me some

acceptable
degree of confidence that you're the same person who posted here as
"skass[at]drew.edu" yesterday. Of course that validity is destroyed if you change your email address or if someone spoofs your address. But it's

still
intrinsically better than an arbitrary ID allocated by the server.

I am a firm believer that natural keys should only be used to logically
design/normalize the data. When it comes to the real reason for keys, data
integrity, more often than not I have seen that natural keys are
intrinsically not good physical primary keys.

You are evidently ignorant of the purpose of a key. A candidate key is a
logical identifier that identifies data. It is not a physical identifier. A
dbms can map the logical identifier to any physical identifier using any
method without altering the logical data model.

I suggest an elementary education in data management is in order before you
pontificate on the subject. Spreading your ignorance will only confuse the
unwary novice. Shame on you.

Jul 20 '05 #39

Bob Badour

"Trey Walpole" <tr********@SPcomcastAM.net> wrote in message
news:uP**************@TK2MSFTNGP12.phx.gbl...

And what happens if someone does a DBCC checkident ('FOO', RESEED)?

Or you have to copy it into a new table and accidently set that table up
with an IDENTITY column and now all your rows get new IDs?
Fire the DBA. If they've allowed unskilled people such access or do

anything accidentally, no telling what other problems they'll cause ;)

Pray your employer never employs the same standard.

Jul 20 '05 #40

Bob Badour

"Trevor Best" <bouncer@localhost> wrote in message
news:1r********************************@4ax.com...

On Tue, 2 Dec 2003 09:04:35 -0500 in comp.databases, "Bob Badour"
<bb*****@golden.net> wrote:

"Louis Davidson" <dr***************@hotmail.com> wrote in message
news:#r**************@tk2msftngp13.phx.gbl...
While it is true that the chosen primary key cannot contain any optional values, it is more the praticioner (sp?) that disagrees with this stance. Compound keys are unwieldy and bad for performance, but the theorist in me says "who cares?" It is all about what is right/best, not what is
fastest/easiest.

Performance is determined by the physical structure and not by the logicalinterface. Legitimate theorists have written ad nauseum on the severe
logical problems caused by using compound keys for references when data maybe missing.

Are you talking about data being missing from a foreign key side of a
relationship? I'm pro identity column myself but I don't see how that
would help in this instance.

The specific issue is compound keys and missing information. A simple key
does not exhibit the same problems regardless whether it is an identity
column.

Consider a compound key with attributes A and B. What happens when the user
inserts a referencing row with a known A and an unknown B? Should the dbms
allow the insert? When should it allow the insert? Should the dbms verify
the A exists at least once in the referenced table? If the A value exists
only once in the referenced table, should the dbms substitute the only
corresponding B value that could be correct in the inserted row? Suppose the
user deletes all the rows from the referenced table that contain a specific
A value. What happens when the referencing table contains corresponding rows
with a known A and an unknown B?

Jul 20 '05 #41

Bob Badour

"Trevor Best" <bouncer@localhost> wrote in message
news:u8********************************@4ax.com...

On Tue, 2 Dec 2003 17:53:44 -0000 in comp.databases, "David Portas"
<RE****************************@acm.org> wrote:
living people than VendorID values. I don't think an identity VendorID
value is any more artificial than any other unique way of identifying a
vendor, so long as it is assigned the first time a vendor enters the
system and is never changed.

But the point is that a "natural" key is verifiable outside of the system.
When I see a NG post from "skass[at]drew.edu" I don't care whether that's
based on your "real" name or even whether S.Kass is the same name as on yourpassport or driver's licence. What's important to me is that it's determinedby a consistent method outside of the system which gives me some acceptabledegree of confidence that you're the same person who posted here as
"skass[at]drew.edu" yesterday. Of course that validity is destroyed if youchange your email address or if someone spoofs your address. But it's stillintrinsically better than an arbitrary ID allocated by the server.

If Steve leaves that educational facility he's at now, his email
address will surely change. If he goes into a witness protection
scheme his name, address and SS number will change, he or someone else
could change quiet a bit about him but if he's on a database with an
identity column as his PK then it's more likely that it will *never*
change.

With all due respect, the whole point of the witness protection programme is
to prevent people from associating the individual with their previous
identity.

Jul 20 '05 #42

Bob Badour

"Stijn Verrept" <sv******@nospan.vub.ac.be> wrote in message
news:#G**************@TK2MSFTNGP10.phx.gbl...

I've read through this thread but I don't understand it. I always use an
int or smallint as primary key, with identity. I believe it would be a mess otherwise.
Your belief does not alter the correct criteria for choosing a key:
simplicity, familiarity and stability.

Example: I have a table with people, last name, first name, address, ... So suppose you would make a natural key then you need at least the last name
and the first name. I have >25 other tables that reference that table. If I get this right I will need to use the name and firstname field in all the other tables as well to reference. Isn't that just a lot of data waste? If I'm missing something, please tell me what because this seems a bit silly.

You have constructed a straw man. One anecdote does not demonstrate or
justify a general principle or rule.

Jul 20 '05 #43

Bob Badour

I see why you were elected a most vociferous person. That's a very long way
of stating something as simple as the criteria for choosing a primary key:
simplicity, familiarity and stability.

"Aaron Bertrand [MVP]" <aa***@TRASHaspfaq.com> wrote in message
news:en**************@TK2MSFTNGP10.phx.gbl...

You have definitely hit a couple of the strong points in IDENTITY's favor.
But there are cases where a "natural" kind of key can be small as well.
Consider when eBay bought PayPal (and other than that, the rest of this
paragraph is completely fictional), they likely had to merge some data...
perhaps eBay used an IDENTITY to generate customer numbers, but they want to align those primary keys with the new data in the PayPal tables. So, the
keys in the PayPal data become INTs, but not IDENTITY. They are kind of
"natural" because they came to the PayPal from an external source, so to
speak, rather than generated arbitrarily from within.

Of course, completely fictional. But surely you can see that not all
natural keys are going to be larger than an IDENTITY, or less efficient.
There are other examples, too. In a small stats system, a SMALLDATETIME
could be the primary key (perhaps several subrelated tables are organized by day). In fact, part of http://www.aspfaq.com/stats.asp (and plenty more
that you can't see) is derived on a set of tables where SMALLDATETIME is the only key of relevance. Okay, so that's still 4 bytes, but you save 4 if
your other alternative is to store an IDENTITY along with the SMALLDATETIME value. Consider:

CREATE TABLE calendar
(
dateValue SMALLDATETIME PRIMARY KEY
)

vs.

CREATE TABLE calendar
(
dateID INT IDENTITY PRIMARY KEY,
dateValue SMALLDATETIME NOT NULL
)

Never mind my goofy naming scheme. :-)

Now, Kass could probably show me some cool dateadd tricks that would allow
me to store just an INT (or maybe even a SMALLINT, depending on the date
range required), and determine what the date value is at runtime. Not that I think that's what his argument would be, but rather just to show that it
is still possible to choose either route. I think the usability of the date value representing what it is, rather than having to derive its value from
some formula, is a good thing.

In cases like e-mail address and SSN (and in fact most cases), I still
prefer your route, where there is a surrogate key (IDENTITY) that prevents
me from having to cascade changes all over the place, and store larger
foreign keys.

Firstname + lastname is obviously a bad choice for a key of any kind,
because I know more than one Aaron Bertrand. So then you bring middle name into the key, and it can still be repeated. Other things like getting
adopted, re-married, legally changing their name, and other reasons why this "key" would change are minor; changes to the key can be dealt with in the
database using DRI/CASCADE or, worst case scenario, through rigorous update code; it will be tougher to re-train users to look up all the tables
containing "Carmen Bertrand" instead of "Carmen Electra." :-) However, I
think the possibility of two people having the same key is a far more
compelling argument for bypassing the natural key and placing some
meaningless identifier, like IDENTITY, that the user doesn't care about and would never have to change.

Now, you might think, "why not bring SSN into the FirstName + MiddleName +
LastName key? That would make it unique." Yes, and hideously large. If
SSN is unique, then why not just use SSN as the key? Again, it's large even on its own (CHAR(9) and surely to become CHAR(10) in our lifetimes), so I
fail to see the benefit of repeating the value in every related table, DRI
or not.

Sorry about the earful, sometimes I get a little typographical diarrhea.
Hopefully that was at least marginally intelligible.

--
Aaron Bertrand
SQL Server MVP
http://www.aspfaq.com/

"Stijn Verrept" <sv******@nospan.vub.ac.be> wrote in message
news:#G**************@TK2MSFTNGP10.phx.gbl...
I've read through this thread but I don't understand it. I always use an int or smallint as primary key, with identity. I believe it would be a

mess
otherwise.

Example: I have a table with people, last name, first name, address, ...

So
suppose you would make a natural key then you need at least the last name and the first name. I have >25 other tables that reference that table.

If
I get this right I will need to use the name and firstname field in all

the
other tables as well to reference. Isn't that just a lot of data waste?

If
I'm missing something, please tell me what because this seems a bit silly.

Stijn Verrept.

Jul 20 '05 #44

Stijn Verrept

"Bob Badour" <bb*****@golden.net> wrote in message
news:Od********************@golden.net...

Your belief does not alter the correct criteria for choosing a key:
simplicity, familiarity and stability.
Well non natural keys do meet the simplicity and stability criteria. Even
more than natural keys I believe.
You have constructed a straw man. One anecdote does not demonstrate or
justify a general principle or rule.

Hmmm one anecdote? I have a database full of tables like this. List of
doctors, departments, users, contacts, medication, ... everywhere I used a
non natural key. The use of natural keys will only happen sometimes when
it's really appropriate, otherwise not.
Stijn Verrept.

Jul 20 '05 #45

David Portas

Do you really allow the same Doctor, Department, etc to appear twice in its
table with different keys? If you don't declare unique natural keys then
that's the kind of problem you have. An IDENTITY isn't a *surrogate* key at
all unless the table also has a natural key - it's just a physical row
identifier.

--
David Portas
------------
Please reply only to the newsgroup
--

Jul 20 '05 #46

Bruce Lewis

"Trey Walpole" <tr********@SPcomcastAM.net> writes:

I am a firm believer that natural keys should only be used to logically
design/normalize the data. When it comes to the real reason for keys, data
integrity, more often than not I have seen that natural keys are
intrinsically not good physical primary keys.
This differs from my experience.
1. Natural keys are, being natural and therefore user entered [i.e.,
provided to the database by external means], fungible. If a user enters
data, they must also be able to modify it. If data can be modified, then its
value as a systemic primary key is gone. Yes, you can cascade updates to
these, but why do it when it can be avoided to start with.
Why do it? To avoid duplicates of course. Why not do it? You don't
seem to be making any sort of case here.
2. Natural keys are typically a composite of atomic attributes. If using a
composite, these must be propagated to referencing tables as foreign keys.
Your normalization drops below par, by having these [potentially] massively
duplicated columns.
I agree that natural keys should be avoided because they're too large.
However, most business reports I see typically have columns that consist
of abbreviations chosen to make the report less wide. These make great
natural keys.
Attributes that are single, [supposedly] unique attributes (e.g., SSN),
usually represent some official, governmentally recognized ID, and therefore
have legal issues with being propagated throughout a system.
Yes, SSNs are problematic because too many organizations use them for
authentication, i.e. "You put Trey Walpole's SSN on this form, so you must
be Trey Walpole". Even in the absence of such stupid organizations,
privacy advocates oppose national IDs for a very good reason: such IDs
make it easy to create good databases that include people. However, I
assume the original poster had the opposite goal: make it easy to create
good databases.
Also, for amateurs and many professionals, natural keys are very often
chosen incorrectly. e.g., I believe some combination of Name and other info
has been used by my ISP as their primary key. My last name was entered into
their system incorrectly, but they cannot fix it because their system will
not allow it. Preposterous and poor design.
It sounds like this "preposterous and poor design" could have been fixed
with a simple REFERENCES ... ON UPDATE CASCADE. Perhaps you should
offer them your services. I can't tell from your story whether or not
they used a poor choice of natural key.

Yes, people can make bad choices as to natural primary keys, but I think
this one additional opportunity to do bad database design is well worth
the risk, given the problems that arise from redundant or duplicate
data.
Surrogate keys generated by using the identity property are ideal for data
integrity, because
1. They are static values [i.e., once entered, it does not change] and the
DBA has control over allowing values in identity columns to be modified.
This is only meaningful if there's something wrong with ON UPDATE
CASCADE, which I think there isn't.
2. They are singleton row ids.

And thus the problem. Earlier this year an e-mail alert system I wrote
was sending two copies when it should just send one. Looking into it,
the employee table had been doubled. I switch to a select distinct to
work around the problem, and someone deleted the duplicates. If we had
a natural primary key for the employee table, I doubt those duplicates
would have gone in.

Primary key constraint errors are your friends.

Jul 20 '05 #47

Stijn Verrept

"David Portas" <RE****************************@acm.org> wrote in message
news:Nv********************@giganews.com...

Do you really allow the same Doctor, Department, etc to appear twice in its table with different keys? If you don't declare unique natural keys then
that's the kind of problem you have. An IDENTITY isn't a *surrogate* key at all unless the table also has a natural key - it's just a physical row
identifier.

I never said I allow them to appear twice in the column, you have Unique
Constraint for that. I could use that as a natural key, but I prefer using
an int or smallint. I don't want to note Name, Firstname, ... in another
table as foreign key! Also in the application I don't see me writing:
select SN_Active from seniors where (SN_Name = :SNName) and (SN_FirstName =
:FirstName) and (SN_BirthDate = :SNBirthDate).
Stijn Verrept.

Jul 20 '05 #48

Steve Kass

Stijn Verrept wrote:

"David Portas" <RE****************************@acm.org> wrote in message
news:Nv********************@giganews.com...

Do you really allow the same Doctor, Department, etc to appear twice in

its

table with different keys? If you don't declare unique natural keys then
that's the kind of problem you have. An IDENTITY isn't a *surrogate* key

at

all unless the table also has a natural key - it's just a physical row
identifier.

I never said I allow them to appear twice in the column, you have Unique
Constraint for that. I could use that as a natural key, but I prefer using
an int or smallint. I don't want to note Name, Firstname, ... in another
table as foreign key! Also in the application I don't see me writing:
select SN_Active from seniors where (SN_Name = :SNName) and (SN_FirstName =
:FirstName) and (SN_BirthDate = :SNBirthDate).
Stijn Verrept.

How about putting a UNIQUE NOT NULL CLUSTERED constraint on the identity
column and putting the PRIMARY KEY NONCLUSTERED constraint on the
multi-column primary key?

That might confuse the anti-identity fanatics enough so they'll stop
complaining. You will have a natural primary key, so they won't think
the world is coming to an end, but you will go on as you always have,
using the identity column for its convenience in queries, FK
constraints, etc. ;)

SK

Jul 20 '05 #49

Aaron Bertrand - MVP

> How about putting a UNIQUE NOT NULL CLUSTERED constraint on the identity

column and putting the PRIMARY KEY NONCLUSTERED constraint on the
multi-column primary key?

Certainly, I suppose you could...

CREATE TABLE splunge
(
splungeID INT IDENTITY(1,1) NOT NULL UNIQUE,
email VARCHAR(128) PRIMARY KEY CLUSTERED
)
GO

CREATE TABLE blat
(
splungeID INT NOT NULL
FOREIGN KEY REFERENCES splunge(splungeID)
)
GO

--
Aaron Bertrand
SQL Server MVP
http://www.aspfaq.com/

Jul 20 '05 #50

What are cons and pros for using IDENTITY property as PK in SQL SERVER 2000?

Similar topics