By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
429,087 Members | 1,508 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 429,087 IT Pros & Developers. It's quick & easy.

AutoNumber vs. Key

P: n/a
Tom
Hi,

I am always confused about what to use as the key for a table.

Let's say I have a company-employee table.

[company]---*[employee]

My co-worker likes to use an AutoNumber or Counter as the key for the
[employee] table (and everything).
I personally use an actual key set for the [employee] table.

So, his table will have one (Autonumber or LONG) column as the key. The
company_id is just another attribute.
Mine, on the other hand, has 2 columns as the key. (i.e.) company_id +
employee_id

So, what is the deal?

--
There is no answer.
There has not been an answer.
There will not be an answer.
That IS the answer!
And I am screwed.
Deadline was due yesterday.

There is no point to life.
THAT IS THE POINT.
And we are screwed.
We will run out of oil soon.

Oct 11 '05 #1
Share this Question
Share on Google+
22 Replies


P: n/a
always use an autonumber identity for everythign you do

single-column primary key

Oct 11 '05 #2

P: n/a
"For everything" is a pretty obscure answer. Other keys are important
too. In a joining table containing only two foreign keys for example
what purpose would IDENTITY serve?

Tom, for the debate you can Google for "Natural Key" and "Surrogate
Key" in this group and in microsoft.public.sqlserver.programming.

--
David Portas
SQL Server MVP
--

Oct 11 '05 #3

P: n/a
Tom
Hi,

Is there a reason for using one and only one column as the key?
I find it hard to enforce the integrity of the database with this design.

for example,
let's say the [company]-[employee]-[sale] tables.
company A has an employee B.
employee B makes a sale C.

If employee B moves to another company, the sale C in the [sale] table will
move along with employee B to the new company.
This doesn't look correct to me.

<db*******@hotmail.com> wrote in message
news:11*********************@g47g2000cwa.googlegro ups.com...
always use an autonumber identity for everythign you do

single-column primary key

Oct 11 '05 #4

P: n/a
Tom
Thanks for the actual terms for these key thingies.

I've searched google but found a lot of different opinions.
Some like autonumber but some like... huh... natural key.

To sum it up from all the posts I've read:
1)
The people who use autonumber said that it performed better and easier to
update the record without affecting the database integrity. (imagine
someone needs to change his or her social security number!!!! all hell broke
lose!!!)

2)
The people who use natural key said that people who use surrogate were just
lazy and not experience with database design. And using natural key could
better model database to the business model.

So... I don't know... that's why I am so confused.
One has physical advantage. The other has logical advantage.

"David Portas" <RE****************************@acm.org> wrote in message
news:11**********************@z14g2000cwz.googlegr oups.com...
"For everything" is a pretty obscure answer. Other keys are important
too. In a joining table containing only two foreign keys for example
what purpose would IDENTITY serve?

Tom, for the debate you can Google for "Natural Key" and "Surrogate
Key" in this group and in microsoft.public.sqlserver.programming.

--
David Portas
SQL Server MVP
--

Oct 11 '05 #5

P: n/a
Tom (no****@yahoo.com) writes:
I am always confused about what to use as the key for a table.

Let's say I have a company-employee table.

[company]---*[employee]

My co-worker likes to use an AutoNumber or Counter as the key for the
[employee] table (and everything).
I personally use an actual key set for the [employee] table.

So, his table will have one (Autonumber or LONG) column as the key. The
company_id is just another attribute.
Mine, on the other hand, has 2 columns as the key. (i.e.) company_id +
employee_id


I would say that the two alternatives describes two different things.
In the one-column case, an employee is rather a person, and if he
moves to another company all data that is tied to him are brought with
him.

In the two-column case, you get a new row when he gets a new job.

The latter appears more useful to me. If we want to know who much sales
did everyone in company X generate last year, we can't tell with the
first setup, since we don't know who worked where that year.

If want to know how much aales Bob has generated over the years in
different company, maybe there should be a person table, to which
employees have a reference.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Oct 11 '05 #6

P: n/a
Tom
I come to the same conclusion. (see the 10/11/2005 1:03pm post)

As I search Google, this thing is getting very confusing. All the posts
point to the one dead end conclusion -- "It depends".

I need something solid so I don't have to reinvent the wheel (or think about
it). The "It depends" thing just doesn't cut it.

"Erland Sommarskog" <es****@sommarskog.se> wrote in message
news:Xn**********************@127.0.0.1...
Tom (no****@yahoo.com) writes:
I am always confused about what to use as the key for a table.

Let's say I have a company-employee table.

[company]---*[employee]

My co-worker likes to use an AutoNumber or Counter as the key for the
[employee] table (and everything).
I personally use an actual key set for the [employee] table.

So, his table will have one (Autonumber or LONG) column as the key. The
company_id is just another attribute.
Mine, on the other hand, has 2 columns as the key. (i.e.) company_id +
employee_id


I would say that the two alternatives describes two different things.
In the one-column case, an employee is rather a person, and if he
moves to another company all data that is tied to him are brought with
him.

In the two-column case, you get a new row when he gets a new job.

The latter appears more useful to me. If we want to know who much sales
did everyone in company X generate last year, we can't tell with the
first setup, since we don't know who worked where that year.

If want to know how much aales Bob has generated over the years in
different company, maybe there should be a person table, to which
employees have a reference.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Oct 12 '05 #7

P: n/a

"Tom" <no****@yahoo.com> wrote in message
news:hi***************@newssvr21.news.prodigy.com. ..
I come to the same conclusion. (see the 10/11/2005 1:03pm post)

As I search Google, this thing is getting very confusing. All the posts
point to the one dead end conclusion -- "It depends".

I need something solid so I don't have to reinvent the wheel (or think about it). The "It depends" thing just doesn't cut it.
"It depends" .

Basically I tend to agree with Celko's philosophy that wheer possible the
key should model reality. If your employees already have some sort of
employee number assigned, use that.

That said, even in many of my tables with natural keys I'll drop in an
Identity column as it can be useful for some ad-hoc queries.



"Erland Sommarskog" <es****@sommarskog.se> wrote in message
news:Xn**********************@127.0.0.1...
Tom (no****@yahoo.com) writes:
I am always confused about what to use as the key for a table.

Let's say I have a company-employee table.

[company]---*[employee]

My co-worker likes to use an AutoNumber or Counter as the key for the
[employee] table (and everything).
I personally use an actual key set for the [employee] table.

So, his table will have one (Autonumber or LONG) column as the key. The company_id is just another attribute.
Mine, on the other hand, has 2 columns as the key. (i.e.) company_id +
employee_id


I would say that the two alternatives describes two different things.
In the one-column case, an employee is rather a person, and if he
moves to another company all data that is tied to him are brought with
him.

In the two-column case, you get a new row when he gets a new job.

The latter appears more useful to me. If we want to know who much sales
did everyone in company X generate last year, we can't tell with the
first setup, since we don't know who worked where that year.

If want to know how much aales Bob has generated over the years in
different company, maybe there should be a person table, to which
employees have a reference.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp


Oct 12 '05 #8

P: n/a
Stu
Just my .02:

If you have a natural key that is VERY constant (more constant than a
name, because names can change) and is simple, use it. If you don't,
use a surrogate key, but be sure that your application and database can
handle data verification (avoiding duplicate records, etc).

Stu

Oct 12 '05 #9

P: n/a
Tom (no****@yahoo.com) writes:
I come to the same conclusion. (see the 10/11/2005 1:03pm post)

As I search Google, this thing is getting very confusing. All the posts
point to the one dead end conclusion -- "It depends".


Yes, that is a very common answer in the database world. And that does
not only apply to data modelling, but also how you should apply indexes,
write queries, implement backup strategies, you name it. It all matters
very much on the actual situations, and which trade-offs you want to make.

So in the end, it boils down do common sense and gut feeling for the
actual case at hand. What is bad is to follow a checklist that says
"Always do X".

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Oct 12 '05 #10

P: n/a
On Tue, 11 Oct 2005 20:44:08 GMT, "Tom" <no****@yahoo.com> wrote:
Thanks for the actual terms for these key thingies.

I've searched google but found a lot of different opinions.
Some like autonumber but some like... huh... natural key.

To sum it up from all the posts I've read:
1)
The people who use autonumber said that it performed better and easier to
update the record without affecting the database integrity. (imagine
someone needs to change his or her social security number!!!! all hell broke
lose!!!)

2)
The people who use natural key said that people who use surrogate were just
lazy and not experience with database design. And using natural key could
better model database to the business model.

So... I don't know... that's why I am so confused.
One has physical advantage. The other has logical advantage.


Unfortunately, all any of us can do is add more opinions to the pile, which is
not likely to generate any final closure, just give you more to ponder. I'll
try to offer what I think is useful, though.

- A table may have one key or more than one "key". It's not so much a
quesiton of whether a table should have a logical or surrogate key as whether
it should have a surrogate key in addition to one or more logical keys, and
use the surrogate key for joins.

- When multiple levels of 1-to-many relationship exist, surrogate keys can
prevent adding additional key fields at each level and coupling fine levels of
detail too closely to the models at courser levels.

- When a table has a surrogate key that is always of the same type and
following the same naming convention, it's easier to generate reusable
client-side code that can do things like keep track of a record, requery it,
save updates after an off-line editing session, etc.

- Surrogate keys tend to be 32-bit numbers that are fairly small, and easy for
the database engine to process.

- Using cascading updates to handle logical key change propagation can result
in very expensive updates with wide-ranging run-time side effects.

- Using surrogate keys to identify record links across subsystems can lead to
excessively tight coupling and interdependency. If 2 subsystems should be
relatively autonomous, they should very likely -not- reference each other's
records by internal surrogate key. Instead, they might use the logical key,
or they might use some kind of globally unique identifier. They should also
be built to tolerate broken links.
Oct 12 '05 #11

P: n/a
On Tue, 11 Oct 2005 20:03:49 GMT, "Tom" <no****@yahoo.com> wrote:
Hi,

Is there a reason for using one and only one column as the key?
I find it hard to enforce the integrity of the database with this design.

for example,
let's say the [company]-[employee]-[sale] tables.
company A has an employee B.
employee B makes a sale C.

If employee B moves to another company, the sale C in the [sale] table will
move along with employee B to the new company.
This doesn't look correct to me.


How does this problem relate to the use of logical vs surrogate keys? The
real, problem is that this model does not track what company the sale is made
for, and traversing the employee doesn't tell what company that is, it tells
what company the employee who made the sale -currently- works for. [sale]
needs its own direct link to [company] if that information needs to be
retrieved later.
Oct 13 '05 #12

P: n/a
db*******@hotmail.com wrote:
always use an autonumber identity for everythign you do

single-column primary key


And this advice is based on what computer science class taught where?
--
Daniel A. Morgan
http://www.psoug.org
da******@x.washington.edu
(replace x with u to respond)
Oct 13 '05 #13

P: n/a
Your whole mental model of RDBMS is wrong.

Exposed locators such as IDENTITY are not surrogate keys. A surrogate
is totally hidden from uers and maintained by the system -- think of
how an index works. If you change a natural key, the DRI actions will
cascade it for you. Hell does not break loose. But if your autonumber
and real key are out of synch,then you are screwed. Just enter the
same record several times to get different autonumbers on duplicate
rows. Drop all but one of the dups and then try to find all of the
referenced rows in other tables.

If you maintain the exposed locator by hand, you have extra work, extra
disk seeks and will eventually make a mistake, thus destroying your
data integrity. How would you validate and verify your data?

The UPC code just went from 10 to 13 digits. It does not matter if you
used an autonumber or the UPC, the UPC has to be updated. If you had
used a proper RDBMS design and SQL-92, you would do an ALTER DOMAIN and
a single update with CASCADE. This is one of the MANY reasons that
rows are not records. This is a problem in SQL Server because it is
still based on a contigous storage model, but other products are not,
so wind up using tools to write scripts.

Finally have you bothered to time the two approaches with small and
large tables? It does not make a difference until the table has to do
a lot of paging and redundant autonumbers accumulate. This is the
21-st century; we have 32 and 64 bit hardware, nano-second speeds,
multi-word instructions, parallel processing and really good disk
drives.

Oct 13 '05 #14

P: n/a
Tom
thanks for the tips.

But I need to follow the specs. The specs says that cascade update or
delete is not supported (along many other things like joinning tables is not
allowed). And it comes to a point that I have to say screw this. I am
going to use autonumber.

I am trying to look for any sounded and proven pattern. But searching
Google gives me only the "it depends" solution.

"--CELKO--" <jc*******@earthlink.net> wrote in message
news:11**********************@z14g2000cwz.googlegr oups.com...
Your whole mental model of RDBMS is wrong.

Exposed locators such as IDENTITY are not surrogate keys. A surrogate
is totally hidden from uers and maintained by the system -- think of
how an index works. If you change a natural key, the DRI actions will
cascade it for you. Hell does not break loose. But if your autonumber
and real key are out of synch,then you are screwed. Just enter the
same record several times to get different autonumbers on duplicate
rows. Drop all but one of the dups and then try to find all of the
referenced rows in other tables.

If you maintain the exposed locator by hand, you have extra work, extra
disk seeks and will eventually make a mistake, thus destroying your
data integrity. How would you validate and verify your data?

The UPC code just went from 10 to 13 digits. It does not matter if you
used an autonumber or the UPC, the UPC has to be updated. If you had
used a proper RDBMS design and SQL-92, you would do an ALTER DOMAIN and
a single update with CASCADE. This is one of the MANY reasons that
rows are not records. This is a problem in SQL Server because it is
still based on a contigous storage model, but other products are not,
so wind up using tools to write scripts.

Finally have you bothered to time the two approaches with small and
large tables? It does not make a difference until the table has to do
a lot of paging and redundant autonumbers accumulate. This is the
21-st century; we have 32 and 64 bit hardware, nano-second speeds,
multi-word instructions, parallel processing and really good disk
drives.

Oct 13 '05 #15

P: n/a
Tom
Must be from somewhere in India where they program $QL $erver for Bill
Gates.

Mahahahaa!!! <G!>

"DA Morgan" <da******@psoug.org> wrote in message
news:1129217676.921233@yasure...
db*******@hotmail.com wrote:
always use an autonumber identity for everythign you do

single-column primary key


And this advice is based on what computer science class taught where?
--
Daniel A. Morgan
http://www.psoug.org
da******@x.washington.edu
(replace x with u to respond)

Oct 13 '05 #16

P: n/a
> The specs says [...] joinning tables is not allowed [!!]

Quit now. Time spent on such a spec is time wasted.

--
David Portas
SQL Server MVP
--

Oct 13 '05 #17

P: n/a
Tom
They do pay the bills though. ;)
"David Portas" <RE****************************@acm.org> wrote in message
news:11**********************@g47g2000cwa.googlegr oups.com...
The specs says [...] joinning tables is not allowed [!!]


Quit now. Time spent on such a spec is time wasted.

--
David Portas
SQL Server MVP
--

Oct 13 '05 #18

P: n/a
On 13 Oct 2005 10:36:00 -0700, --CELKO-- wrote:
Your whole mental model of RDBMS is wrong.
Hi Joe,

You're whole mental model of surrogate keys is wrong.

Exposed locators such as IDENTITY are not surrogate keys. A surrogate
is totally hidden from uers and maintained by the system -- think of
how an index works.
So don't display the surrogate key on the data entry screen.

If you change a natural key, the DRI actions will
cascade it for you. Hell does not break loose.
If you change a natural key in a table that has a surrogate key as well,
there's no need to cascade it, as there's no redundant storage of the
key values. Hell does not break losse AND you don't need to get locks on
half the tables in the database.

But if your autonumber
and real key are out of synch,then you are screwed.
Eh? How can they "get out of synch"? Please post proper DDL and DML to
clarify what you mean.

Just enter the
same record several times to get different autonumbers on duplicate
rows. Drop all but one of the dups and then try to find all of the
referenced rows in other tables.
That's not a result of IDENTITY, that's a result of forgetting to
include a CHECK constraint. Database theory 101.

If you maintain the exposed locator by hand, you have extra work, extra
disk seeks and will eventually make a mistake, thus destroying your
data integrity.
That's the point - you DON'T maintain the surrogate key by hand.

How would you validate and verify your data?
Same as you would do it: use the natural key.

The UPC code just went from 10 to 13 digits. It does not matter if you
used an autonumber or the UPC, the UPC has to be updated. If you had
used a proper RDBMS design and SQL-92, you would do an ALTER DOMAIN and
a single update with CASCADE.
For starters, SQL Server 2000 doesn't have a switch that customers can
just flick to get full SQL-92 compliance. There is no ALTER DOMAIN
command in SQL Server 2000.

Fortunately, those who had the foresight to add a surrogate key to the
Products table don't need ALTER DOMAIN, nor CASCADE. A simple ALTER
TABLE and a single UPDATE on one single table suffices.

This is one of the MANY reasons that
rows are not records. This is a problem in SQL Server because it is
still based on a contigous storage model, but other products are not,
so wind up using tools to write scripts.
You mean to imply that other RDBMS's have no autonumber facility? Get
off it. Less than one minute of googling gave me:

- SQL Server, DB2: IDENTITY (with some variations in syntax);
- MySQL, SQLite: AUTO_INCREMENT (with some variations in syntax);
- Postgres, Oracle, Sybase, Mimer: CREATE SEQUENCE (with some variations
in syntax).

http://sqlzoo.net/howto/source/z.dir/tip000001

Finally have you bothered to time the two approaches with small and
large tables?
The OP was not asking for an advice for a specific situation, but for a
generic truth on whether to use or avoid identity. Which of course there
isn't - since your religious insisting that IDENTITY should never be
used is just as fundamentally wrong as brainlessly whacking an IDENTITY
column onto each and every table.

In databases, "it depends" is the only reliable answer.

It does not make a difference until the table has to do
a lot of paging and redundant autonumbers accumulate.
Redundant autonumbers? Care to elaborate?

This is the21-st century; we have 32 and 64 bit hardware, nano-second speeds,
multi-word instructions, parallel processing and really good disk
drives.


But we also have applications that gather hundreds of rows per second,
and queries that run on million-row tables - situations that can't be
"tuned" by just throwing more and better hardware at it. Performance s
just as important a consideration as it ever was. And including or
removing a surrogate key can (in some situations) improve or damage
performance by several degrees of magnitude.

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)
Oct 13 '05 #19

P: n/a
Tom wrote:
thanks for the tips.

But I need to follow the specs.


Not if the specification is valueless or harmful.

Part of what you are supposed to bring to the party is expert knowledge
with respect to relational database systems. You are supposed to
contribute ... not just follow orders.

Go back, with cogent reasoned arguments, and ask for changes.
--
Daniel A. Morgan
http://www.psoug.org
da******@x.washington.edu
(replace x with u to respond)
Oct 13 '05 #20

P: n/a
>> The specs says that cascade update or delete is not supported (along many other things like joinning tables is not allowed). <<

No joins means that you have a (realllly crappppy) file system and not
an RDBMS.
I am trying to look for any sounded and proven pattern. <<


The only proven patern I know in this situation based on 35+ yerts in
the trade is to update your resume and leave the sinking ship. Do it
fast so the stink of failure does not hang on your clothing.

..

Oct 14 '05 #21

P: n/a
Tom wrote:
Hi,

Is there a reason for using one and only one column as the key?
I find it hard to enforce the integrity of the database with this design.

for example,
let's say the [company]-[employee]-[sale] tables.
company A has an employee B.
employee B makes a sale C.

If employee B moves to another company, the sale C in the [sale] table will
move along with employee B to the new company.
This doesn't look correct to me.


As a programmer, if you work for a company you already know that code
you write at such company belongs to the company, not to you. Why would
the sale in your example belong to the salesman and not the company in
the first place. The sale would be a sub record of the company, the
salesman is just an attribute.
Oct 14 '05 #22

P: n/a
Tom
Hahahaha!!!
huh... <G!>
Well... they do pay the bill... and they do pay the bill ON TIME.
So... it is all good to me.
not to mention that I've already left my dignity in my house before I set
foot in my cubical.
"--CELKO--" <jc*******@earthlink.net> wrote in message
news:11********************@z14g2000cwz.googlegrou ps.com...
The specs says that cascade update or delete is not supported (along
many other things like joinning tables is not allowed). <<
No joins means that you have a (realllly crappppy) file system and not
an RDBMS.
I am trying to look for any sounded and proven pattern. <<


The only proven patern I know in this situation based on 35+ yerts in
the trade is to update your resume and leave the sinking ship. Do it
fast so the stink of failure does not hang on your clothing.

.

Oct 14 '05 #23

This discussion thread is closed

Replies have been disabled for this discussion.