What datatype to use for PK?

Don Vaillancourt

Over the years I have always used the decimal(18,0) as the datatype for
primary keys. Aside from the number of significant numbers involved,
would BigInt type be better for performance or is decimal(18,0) still okay.
--
Don Vaillancourt
Director of Software Development
WEB IMPACT INC.
phone: 416-815-2000 ext. 245
fax: 416-815-2001
email: do**@web-impact.com <mailto:do**@webimpact.com>
web: http://www.web-impact.com
Web Impact Inc. <http://www.web-impact.com>
This email message is intended only for the addressee(s) and contains
information that may be confidential and/or copyright.

If you are not the intended recipient please notify the sender by reply
email and immediately delete this email.

Use, disclosure or reproduction of this email by anyone other than the
intended recipient(s) is strictly prohibited. No representation is made
that this email or any attachments are free of viruses. Virus scanning
is recommended and is the responsibility of the recipient.

Jul 23 '05 #1

Subscribe Post Reply

3158

Gary

BIGINT is stored one byte smaller than DECIMAL(18,0). 8 bytes vs. 9
bytes.

Not likely to significantly change performance unless you're talking
millions of rows being processed.

Jul 23 '05 #2

Don Vaillancourt

I always thought that decimal stored it's values in text format. So for
decimal(18,0) MSSQL would take up 18 bytes. Or am I partially right and
MSSQL uses a nibble to represent a digit.

Gary wrote:

BIGINT is stored one byte smaller than DECIMAL(18,0). 8 bytes vs. 9
bytes.

Not likely to significantly change performance unless you're talking
millions of rows being processed.

--
Don Vaillancourt
Director of Software Development
WEB IMPACT INC.
phone: 416-815-2000 ext. 245
fax: 416-815-2001
email: do**@web-impact.com <mailto:do**@webimpact.com>
web: http://www.web-impact.com
Web Impact Inc. <http://www.web-impact.com>
This email message is intended only for the addressee(s) and contains
information that may be confidential and/or copyright.

If you are not the intended recipient please notify the sender by reply
email and immediately delete this email.

Use, disclosure or reproduction of this email by anyone other than the
intended recipient(s) is strictly prohibited. No representation is made
that this email or any attachments are free of viruses. Virus scanning
is recommended and is the responsibility of the recipient.

Jul 23 '05 #3

Gary

That is not correct. DECIMAL (18,0) uses 9 bytes stored in hexadecimal
format, not text format. Check BOL.

Jul 23 '05 #4

Hugo Kornelis

On Wed, 09 Feb 2005 15:06:33 -0500, Don Vaillancourt wrote:

I always thought that decimal stored it's values in text format. So for
decimal(18,0) MSSQL would take up 18 bytes. Or am I partially right and
MSSQL uses a nibble to represent a digit.

Hi Don,

From Books Online:

Precision Storage bytes
1 - 9 5
10-19 9
20-28 13
29-38 17

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)

Jul 23 '05 #5

tal_mcmahon

Just a thought,
I have found that using unique identifiers for PK columns have benefits
enough to outweigh the size and code differences.

If i use a Unique identifier i do not need to make a return trip to a
client with the id generated by the Database. I can create a Unique
identifier almost anywhere and push the parent pk to the children
without a need for all of the Create a Parent step, now create details,
now create details for the details.

Also, When i use GUID for pk i can mix together values from different
tables and still have a unique column within the view. for example I
can grab address info from both the Customer and the Vendor tables to
create a view that can return all addresses.

anyway, just a thought

Tal

Jul 23 '05 #6

Gary

The only caveat to using unique identifier is that it can create
indexes that are up to four times larger than using an int, for
example. That could cause as much as 4X disk I/O to retrieve your disk
pages. Only a concern for large tables, though.

Jul 23 '05 #7

Erland Sommarskog

Don Vaillancourt (do**@webimpact.com) writes:

I always thought that decimal stored it's values in text format. So for
decimal(18,0) MSSQL would take up 18 bytes. Or am I partially right and
MSSQL uses a nibble to represent a digit.

The format you get up to a client is a record with precision, scale
and sign, and there are up to 16 bytes for the value.

typedef struct tagDB_NUMERIC {
BYTE precision;
BYTE scale;
BYTE sign;
BYTE val[16];
} DB_NUMERIC;

As for the question whether to use bigint or decimal(18,0), my answer is
neither. The prime option should be to use natural keys, and natural
keys rarely calls for 64-bit numbers. When you need artificial keys,
integer should do in most cases.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Jul 23 '05 #8

Gary

Erland, where did you get that typedef code? Is that from SQL Server
source code?

Jul 23 '05 #9

David Portas

> When i use GUID for pk i can mix together values from different

tables and still have a unique column within the view.

Why is that an advantage? If two rows represent the same physical
entity then you usually want to be able to identify them as being the
same - otherwise you end up with unwanted duplicates. That is why
natural keys are important. Billing the same customer twice isn't
usually good for business!

--
David Portas
SQL Server MVP
--

Jul 23 '05 #10

Greg D. Moore \(Strider\)

"Don Vaillancourt" <do**@webimpact.com> wrote in message
news:Ci********************@nnrp1.uunet.ca...

Over the years I have always used the decimal(18,0) as the datatype for
primary keys. Aside from the number of significant numbers involved,
would BigInt type be better for performance or is decimal(18,0) still

okay.

I think it depends on what your primary key is. If it's an ISBN for
example, I don't think either of these would work.

Jul 23 '05 #11

Erland Sommarskog

David Portas (RE****************************@acm.org) writes:

Why is that an advantage? If two rows represent the same physical
entity then you usually want to be able to identify them as being the
same - otherwise you end up with unwanted duplicates. That is why
natural keys are important. Billing the same customer twice isn't
usually good for business!

As long as the customer pays both bills it is!

As for the use of artificial keys - if the thing inserted into the database
is an order by a web customer, there is no natural key until it has been
inserted. The order-entry system must generate the key itself.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Jul 23 '05 #12

Erland Sommarskog

Gary (ga**********@wcc.ml.com) writes:

Erland, where did you get that typedef code? Is that from SQL Server
source code?

The OLE DB Refeference Manual in the MDAC Books Online. It's also in MSDN
Library.

Of course, that's a general interface which is independent of the data
source, but I would not be surprised if the format is taken from SQL
server.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Jul 23 '05 #13

David Portas

A natural key is simply a product of the way some data is represented
in a table - a subset of the attributes that you need to record. In the
case of an online order that might be (user,order_datetime) for
example. A natural key is never "generated" internally because it
always represents something that is "outside the system" - that is the
critical distinction between a natural and artificial key.

--
David Portas
SQL Server MVP
--

Jul 23 '05 #14

Mike Sherrill

On 9 Feb 2005 14:00:03 -0800, "ta*********@hotmail.com"
<ta*********@hotmail.com> wrote:

[snip]

Also, When i use GUID for pk i can mix together values from different
tables and still have a unique column within the view. for example I
can grab address info from both the Customer and the Vendor tables to
create a view that can return all addresses.

I'd like to see the conceptual model behind that.

--
Mike Sherrill
Information Management Systems

Jul 23 '05 #15

David Portas

Conceptually the model is a "bag" rather than a "set" i.e. duplicates
are allowed (the GUID isn't part of the logical model so it doesn't
count). A bag may be useful for a table in a "staging" database as part
of a data transformation process. The transformation then reconciles
and eliminates duplicates so that integrity is maintained.

--
David Portas
SQL Server MVP
--

Jul 23 '05 #16

Erland Sommarskog

David Portas (RE****************************@acm.org) writes:

A natural key is simply a product of the way some data is represented
in a table - a subset of the attributes that you need to record. In the
case of an online order that might be (user,order_datetime) for
example. A natural key is never "generated" internally because it
always represents something that is "outside the system" - that is the
critical distinction between a natural and artificial key.

The natural key for an order - the way users would refer to it - is
the order id. In a paper-based computer-less system, it would be a
pre-printed number on an order book. In a computer-based system, the
order id is generated by the system at the time for the order.

Involving userid is likely to be out of the question in many middle-
tier apps - all orders from the same user. Customer may do better, but
it may be perfectly legal for the same customer to place two orders at
the same time. For instance, the customer may be a company, and two
different departments are making different orders.

And if even if you could construct this natural key, I can tell you that the
users blissfully will ignore it. What they need is an order id, and you
must somehow report that back. In a web app, you sometimes want to avoid
an extra rountrip to get that number. Sending down a GUID from the client
is one way to handle it. A GUID is a poor key for usability, but if it's
reported back to another app in a business-to-business solution, this is
less of an issue.

The idea with a "natural key" as being something "outside the system" may
sound nice, but it does not model the reality for real-life system.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Jul 23 '05 #17

David Portas

> Involving userid is likely to be out of the question in many middle-

tier apps - all orders from the same user
True. There may be a difference between "user" and "customer". Customer
is what I meant. Customers do indeed require an order number generated
by the system - no problem there.
it may be perfectly legal for the same customer to place two orders at the same time. For instance, the customer may be a company, and two
different departments are making different orders.
Say, two orders are placed simultaneously by different entities
(departments or whatever) and you don't record the information that
distinguishes them then how will you know which is which? The
artificial order number doesn't tell you that. In your hypothetical
scenario wouldn't the order confirmation emails go to the same address?
So even the customer's Accounts Payable department wouldn't know which
items on the order belonged to which department unless the end-user
recorded that information separately. Is there a benefit in issuing
multiple order numbers in this scenario? I'm not sure, but if there
were and you wanted to persist information about different, arbitrary
"sub-groups" of order items with all other attributes of the order
identical then in 3NF that information belongs as an attribute in the
Order Details table, not as a duplicate in the Order table. That's
unless the "department" attribute was recorded as well, in which case
Department would obviously become part of the Order table's key.
The idea with a "natural key" as being something "outside the system" may sound nice, but it does not model the reality for real-life system.

I thought "real-life" was "outside the system" but maybe your
definition of reality differs from mine. :-)

--
David Portas
SQL Server MVP
--

Jul 23 '05 #18

Mike Sherrill

On 10 Feb 2005 03:28:00 -0800, "David Portas"
<RE****************************@acm.org> wrote:

Conceptually the model is a "bag" rather than a "set" i.e. duplicates
are allowed (the GUID isn't part of the logical model so it doesn't
count).

Not exactly where I was going, but it's in the same direction. I was
using "conceptual model" in the sense Halpin uses it in ORM. And I'd
still like Tal to post it.

--
Mike Sherrill
Information Management Systems

Jul 23 '05 #19

Don Vaillancourt

Well no, the PK would be an integer.

Greg D. Moore (Strider) wrote:

"Don Vaillancourt" <do**@webimpact.com> wrote in message
news:Ci********************@nnrp1.uunet.ca...
Over the years I have always used the decimal(18,0) as the datatype for
primary keys. Aside from the number of significant numbers involved,
would BigInt type be better for performance or is decimal(18,0) still

okay.

I think it depends on what your primary key is. If it's an ISBN for
example, I don't think either of these would work.

Jul 23 '05 #20

--CELKO--

>> Over the years I have always used the decimal(18,0) as the datatype
for primary keys. <<

That's wrong! The best key is the 17-digit Hebrew number that God puts
on the bottom of everything in creation!
Sounds pretty silly, doesn't it? But this the same thing as your
question.

There is no "magical, Universal, one-size-fits-all" key. There are
industry standards and natural keys, which you discover with research.
There are techniques for designing keys when the research fails --
check digits, grep patterns, validation rules, etc.

If you have been blindly writing DECIMAL(18,0) as the key on your
tables, then you probably have never actually designed an RDBMS.
Instead, you have been faking pointer chains and have not had any
relational keys.

You might want to get a course in the basics and learn exactly what a
key is. I would also do a full data audit on what you have now.

Jul 23 '05 #21

tal_mcmahon

Ok,
I think my example of address was taken a bit to literally. I used it
only as an example. Where I was going with this is that i can have a
table that I can mix foreign keys from different parents. Before you
fall over or start the flame war, here is a system, There are reports,
there are Contacts, Vehicles, and Property all attached to these
reports. All of these (reports,contacts,vehicles, and property) can
have multiple images attached to them. I could have:

1. Seperate tables for ReportImages,ContactImages,VehicleImages, and
PropertyImages all with there own pointers to the addresses.
I chose not to use this as it would mean 4 times the stored
procedures etc.

2. a single table with a column for parentID (int), and
ParentType(int).
I chose against this as most of the UI controls work better
with one value

3. a single table with a column for parentID (GUID)
i chose this method because I could select from one place
using one Column as the criteria.
This method also allowed me to find images based upon their
description(varchar(250)) no matter what parent they were associated
with. Sometimes a Smashed vehicle picture is attached to the report
not the vehicle and vice versa. Once I find the image then I can find
the parent record as needed.

well there is my defense hope it holds up.

tal

Jul 23 '05 #22

Don Vaillancourt

I have been designing databases for about 5 years. And from a
structural stand-point my designs have always been excellent. I have
always been able to add tables at a later time without falling into a
hole wishing I had designed the schema differently.

But, yes, I do lack knowledge in the area of what is the best types to
use for performance reasons. But I have never been corrected in my
selection of PK types even by well experienced DBAs.

What I am really interested in is if MSSQL (even, Oracle, MySQL, etc)
handles decimal differently than integer when generating keys and such.

Will an 8-byte BIGINT perform better than a 9-byte Decimal(18,0) when
dealing with millions of rows.

I wouldn't know how to test this since the time it would take to perform
such a test would probably differ in the nano-seconds if not
insignificant milli-seconds. Which is still even harder to test on a
multitasking OS when your result time is certainly always different.

--CELKO-- wrote:

Over the years I have always used the decimal(18,0) as the datatype

for primary keys. <<

That's wrong! The best key is the 17-digit Hebrew number that God puts
on the bottom of everything in creation!
Sounds pretty silly, doesn't it? But this the same thing as your
question.

There is no "magical, Universal, one-size-fits-all" key. There are
industry standards and natural keys, which you discover with research.
There are techniques for designing keys when the research fails --
check digits, grep patterns, validation rules, etc.

If you have been blindly writing DECIMAL(18,0) as the key on your
tables, then you probably have never actually designed an RDBMS.
Instead, you have been faking pointer chains and have not had any
relational keys.

You might want to get a course in the basics and learn exactly what a
key is. I would also do a full data audit on what you have now.

Jul 23 '05 #23

Erland Sommarskog

David Portas (RE****************************@acm.org) writes:

Say, two orders are placed simultaneously by different entities
(departments or whatever) and you don't record the information that
distinguishes them then how will you know which is which? The
artificial order number doesn't tell you that. In your hypothetical
scenario wouldn't the order confirmation emails go to the same address?
So even the customer's Accounts Payable department wouldn't know which
items on the order belonged to which department unless the end-user
recorded that information separately. Is there a benefit in issuing
multiple order numbers in this scenario? I'm not sure, but if there
were and you wanted to persist information about different, arbitrary
"sub-groups" of order items with all other attributes of the order
identical then in 3NF that information belongs as an attribute in the
Order Details table, not as a duplicate in the Order table. That's
unless the "department" attribute was recorded as well, in which case
Department would obviously become part of the Order table's key.

Most likely there is information to tell the orders apart, or if
they are duplicates. But that information is likely to be both
unpractical and uninteresting to use to identify the order.

The idea with a "natural key" as being something "outside the system"
may sound nice, but it does not model the reality for real-life system.

I thought "real-life" was "outside the system" but maybe your
definition of reality differs from mine. :-)

Yes, many real-life system lives outside the system known as theory.
All systems that stores customers for instance - another concept that
does not have any useful natural key...
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Jul 23 '05 #24

David Portas

I differ on whether natural keys are "unpractical", "uninteresting" and
"not useful". There are plenty of scenarios in which natural keys are
not only useful but essential - data integration betweeen heterogeneous
data sources for example.

At least we do after all seem to agree that, with the right model,
natural keys always exist - whether you choose to implement them or not
is simply a decision to be made at design time.

--
David Portas
SQL Server MVP
--

Jul 23 '05 #25

Erland Sommarskog

David Portas (RE****************************@acm.org) writes:

I differ on whether natural keys are "unpractical", "uninteresting" and
"not useful". There are plenty of scenarios in which natural keys are
not only useful but essential - data integration betweeen heterogeneous
data sources for example.
I certainly don't wish to imply that natural keys never are useful. There
are certainly cases where they are.

But I like to point out that what is a natural key for one system, often
is key generated by another system, and thus originally an artificial key.
In fact, this can happen within a system as well. If I identify a position
by account number and instrument id, that is a natural key composed by
two artificial keys.
At least we do after all seem to agree that, with the right model,
natural keys always exist - whether you choose to implement them or not
is simply a decision to be made at design time.

Certainly a disagreement on "right". I claim that "customer" is a concept
that in many situations does not have a natural key. Even if you in could
in theory find information that you could use as a natural key, it would
be a gross error of system design to include in your database. And even
if you would include it, it would be very difficult to verify that you
actually have the correct values in your database.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Jul 23 '05 #26

David Portas

You are correct that Customer data can pose some big challenges of data
verification. However, it would be an error NOT to enforce keys on this
data in my experience. Worst case is that you duplicate all the
customer attributes and allocate a new artifical key. In that scenario
there is likely to be little or no hope of ever determining which row
represents which physical customer or ensuring that referencing rows
point to the "right" customer. In other words you gain *nothing* from
having a redundant row in your table and you lose a lot from not having
a natural key. Use constraints to prevent duplicates and verify data at
entry or as part of a data integration process. There is plenty of
software out there to help verify and manage customer databases. It may
never be 100% accurate but it will be more useful and manageable with a
key than without one.

--
David Portas
SQL Server MVP
--

Jul 23 '05 #27

Erland Sommarskog

David Portas (RE****************************@acm.org) writes:

You are correct that Customer data can pose some big challenges of data
verification. However, it would be an error NOT to enforce keys on this
data in my experience. Worst case is that you duplicate all the
customer attributes and allocate a new artifical key. In that scenario
there is likely to be little or no hope of ever determining which row
represents which physical customer or ensuring that referencing rows
point to the "right" customer. In other words you gain *nothing* from
having a redundant row in your table and you lose a lot from not having
a natural key. Use constraints to prevent duplicates and verify data at
entry or as part of a data integration process. There is plenty of
software out there to help verify and manage customer databases. It may
never be 100% accurate but it will be more useful and manageable with a
key than without one.

Assume that you run a web shop. How do you verify new customers? Well,
many web shops seems to use a user id or the mail address. When a new
customer register, they should probably check that some other crucial
attrbutes are not already in use by other customers, such as e-mail address
(if you use a user id), national registration number and credit-card number.
However, few business would require you to register all three. In any case,
if a customer prefers to register a second time, there is very little you
can do to stop him.

For a system where the customers are registered by staff through a GUI, you
can add some checks to the registration form. And there is a whole lot of
point of having the checks there and not in the database, because such
checks can help to prevent the user from entering data by mistake, but
not stop him from entering data he actually has to enter. It's naïve to
believe that you thought of each and every case when you designed your
database.

For instance, in our system there is at one case where users will have to
register a customer a second time, and that is if the customers changes tax
countries in the middle of the year. Yes, this can be construed as a
weakness of our system that we don't keep a history of tax country.

Then again, the driving force when we develop our system is what our
customer are prepared to pay for, and not fulfilment of theoretic rules.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Jul 23 '05 #28

David Portas

> Assume that you run a web shop. How do you verify new customers?
Well,

many web shops seems to use a user id or the mail address. When a new customer register, they should probably check that some other crucial attrbutes are not already in use by other customers, such as e-mail address (if you use a user id), national registration number and credit-card number. However, few business would require you to register all three. In any case, if a customer prefers to register a second time, there is very little you can do to stop him.
Isn't email address the key here? Sites I've used don't let me register
a second time with the same address and that seems like the only
sensible policy if the site issues its own user ID to the user. If the
user supplies his own user login name then that might be preferred to
email address as a key. In either case, I don't see a problem.
Then again, the driving force when we develop our system is what our
customer are prepared to pay for, and not fulfilment of theoretic

rules.

If the customer won't pay for it that is indeed a good reason not to do
it! Of course they may end up paying more in the end for you to come
back and fix it :-)

--
David Portas
SQL Server MVP
--

Jul 23 '05 #29

Erland Sommarskog

David Portas (RE****************************@acm.org) writes:

Isn't email address the key here? Sites I've used don't let me register
a second time with the same address and that seems like the only
sensible policy if the site issues its own user ID to the user. If the
user supplies his own user login name then that might be preferred to
email address as a key. In either case, I don't see a problem.

Sure, you can use an email as a key! But the difference between an
email address or a system-generated customer number is slim. Both are
unverifiable combinations of random bits. What you can verify with
an email address is that it has a an @ and a . and ends in a known domain.
Of course, you can send out a mail and ask for acknowledge before you
create the user. But that mail address could disappear just seconds
later.

One advantage with a customer number is that if I change my mail address,
I don't have to register as a new customer at the site.

What is really horrible is system that uses users names that looks like
email addresses, but aren't. This means that someone who feels like can
use someone else's mail address as his user name.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Jul 23 '05 #30

--CELKO--

>> Sure, you can use an email as a key! But the difference between an
email address or a system-generated customer number is slim. Both are
unverifiable combinations of random bits. What you can verify with an
email address is that it has a an @ and a . and ends in a known domain.
<<

That grep() test and domain look up are VALIDATION processses. That
is, when I look at this value, can I tell if it is a possible domain
element.

VERIFICATION would involve a trusted external source. You contradict
yourself in the next sentence by giving a verification process (i.e.
send an email and get a reply).

Of course, you can send out a mail and ask for acknowledge before you create the user. But that mail address could disappear just seconds
later. <<

And so could system-generated customer number when that account is
closed. If a sequential customer is skipped, as often happens with
IDENTITY and other autonumbering procedural routines, how do you know?
If I use the skipped number somewhere, it is clearly valid, but how do
you verify it?

In the old days, there was a forms control officer who issued such
things and tracked all of the numbers. He controlled pre-printed forms
for the most part, but later had computer routines to issue blocks of
numbers (with check digits, in pseudo-random order, etc.) for various
forms.
One advantage with a customer number is that if I change my mail

address,
I don't have to register as a new customer at the site. <<

When I have changed my email, I log on under the old one and update my
profile on the custoemr sites I use. One more field on a form than
updating my mail address at the same time and I get a verification
email from it. I do not see that saving some typing outweighs the
extra security.

Jul 23 '05 #31

Erland Sommarskog

--CELKO-- (jc*******@earthlink.net) writes:

That grep() test and domain look up are VALIDATION processses. That
is, when I look at this value, can I tell if it is a possible domain
element.
It checks conformance to a certain syntactic pattern. If you auto-
generate a numeric id, you know that it conforms to the syntax: it's
numeric.
VERIFICATION would involve a trusted external source. You contradict
yourself in the next sentence by giving a verification process (i.e.
send an email and get a reply).

And you have to be careful not to interpret a bounce about unknown
address as a verified address.

But even if you verify the address correctly, all you know is that the
address existed at that point in time, and you have no idea whether
it relates to the person who registered as a customer.

Of course, you can send out a mail and ask for acknowledge before

you create the user. But that mail address could disappear just seconds
later. <<

And so could system-generated customer number when that account is
closed. If a sequential customer is skipped, as often happens with
IDENTITY and other autonumbering procedural routines, how do you know?
If I use the skipped number somewhere, it is clearly valid, but how do
you verify it?

The advantage with the generated number is that you know that it
is internal to the system - you are not fooling naïve people to believe
that it represents anything. Verification is simple: if we have a customer
with id 9876 in the system, it's a valid customer, else it's not.

As for gaps in the series, that depends on the business requirements.
If you can accept gaps, you can use IDENTITY. If you canont accept gaps,
IDENTITY is strictly verboten. In our system we have a couple of series
of numbers that must be contiguous (financial systems), and you always
request the number through a stored procedure. At some point, we found
that we had plenty of gaps. That caused us to add a log table which that
stored procedure writes to, so at least we can see when the missing
number was taken out. (We believe that the missing numbers were due to
code that first grabs a number, then processes the data only find
that there is no transaction to book. However, since we added the
table, we never had reason to analyse this more.)

Using IDENTITY or not for customer numbers is not likely to have much
importance. But for things for order numbers you gain scalability, since
the number series is a hot spot which is outside transaction scope.
All the time assuming, of course, that you can accept the gaps for
missing numbers. Check digits can always be arranged with using a
computed column as a the order number that is actually displayed.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Jul 23 '05 #32

David Portas

Referential integrity is not the same thing as user authentication or
validation. It is quite true that a natural key in itself doesn't
gurarantee that the data is correct.

The original point I made here was simply an objection to your
statement that "there is no natural key" for an order from a web
customer. I think this discussion of the pros and cons of email
addresses versus assigned IDs underlines a crucial point: that there is
a choice. One may choose natural keys and/or artificial keys. There are
many factors that may influence those design decisions and not all have
to do with what is logical or by the book but it is a mistake to
believe that the choice doesn't exist. The idea that "there are no keys
in my data" is a delusion that I don't think any of us should
encourage. Beyond that, discussions about whether and when to use
artificial or surrogate keys in hypothetical scenarios are pretty
futile.

--
David Portas
SQL Server MVP
--

Jul 23 '05 #33

Erland Sommarskog

David Portas (RE****************************@acm.org) writes:

The original point I made here was simply an objection to your
statement that "there is no natural key" for an order from a web
customer. I think this discussion of the pros and cons of email
addresses versus assigned IDs underlines a crucial point: that there is
a choice. One may choose natural keys and/or artificial keys. There are
many factors that may influence those design decisions and not all have
to do with what is logical or by the book but it is a mistake to
believe that the choice doesn't exist. The idea that "there are no keys
in my data" is a delusion that I don't think any of us should
encourage. Beyond that, discussions about whether and when to use
artificial or surrogate keys in hypothetical scenarios are pretty
futile.

I'd say that an experienced database designer knows when he should go
looking for natural keys, and when he shouldn't. Trying to find the
"natural" key for an order is likely to be a waste of time. Using the
email address as the key for a customer and believing that it is
infinitely better than a internally generated number is a fallacy.

Yes, going for artificial keys by routine, could lead to that the
database get filled with bad data.

But a choice of a natural key based on an incorrect model of the world
can lead to a system that cannot accept legal data, and thus will be
hated by the users.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Jul 23 '05 #34

Greg D. Moore \(Strider\)

"Don Vaillancourt" <do**@webimpact.com> wrote in message
news:us********************@nnrp1.uunet.ca...

I have been designing databases for about 5 years. And from a
structural stand-point my designs have always been excellent. I have
always been able to add tables at a later time without falling into a
hole wishing I had designed the schema differently.

But, yes, I do lack knowledge in the area of what is the best types to
use for performance reasons. But I have never been corrected in my
selection of PK types even by well experienced DBAs.

What I am really interested in is if MSSQL (even, Oracle, MySQL, etc)
handles decimal differently than integer when generating keys and such.

Will an 8-byte BIGINT perform better than a 9-byte Decimal(18,0) when
dealing with millions of rows.

I wouldn't know how to test this since the time it would take to perform
such a test would probably differ in the nano-seconds if not
insignificant milli-seconds. Which is still even harder to test on a
multitasking OS when your result time is certainly always different.

And this should ultimately tell you that it's irrelevant.

Jul 23 '05 #35

Greg D. Moore \(Strider\)

"Don Vaillancourt" <do**@webimpact.com> wrote in message
news:Lo********************@nnrp1.uunet.ca...

Well no, the PK would be an integer.
Really and how do you store X in that case?

Or a leading zero?

Greg D. Moore (Strider) wrote:
"Don Vaillancourt" <do**@webimpact.com> wrote in message
news:Ci********************@nnrp1.uunet.ca...
Over the years I have always used the decimal(18,0) as the datatype for
primary keys. Aside from the number of significant numbers involved,
would BigInt type be better for performance or is decimal(18,0) still

okay.

I think it depends on what your primary key is. If it's an ISBN for
example, I don't think either of these would work.

--
Don Vaillancourt
Director of Software Development
WEB IMPACT INC.
phone: 416-815-2000 ext. 245
fax: 416-815-2001
email: do**@web-impact.com <mailto:do**@webimpact.com>
web: http://www.web-impact.com
Web Impact Inc. <http://www.web-impact.com>
This email message is intended only for the addressee(s) and contains
information that may be confidential and/or copyright.

If you are not the intended recipient please notify the sender by reply
email and immediately delete this email.

Use, disclosure or reproduction of this email by anyone other than the
intended recipient(s) is strictly prohibited. No representation is made
that this email or any attachments are free of viruses. Virus scanning
is recommended and is the responsibility of the recipient.

Jul 23 '05 #36

Similar topics