By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
444,137 Members | 2,209 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 444,137 IT Pros & Developers. It's quick & easy.

String vs Numeric Type

P: n/a
would it matter if I decided to make a primary key a string of numbers
vs actual numbers ? would it make any difference to DB2 in terms of
efficiency ? why would you choose numeric over a string for a key that
does not need to be in numeric format ?

thanks
--
reply to newsgroupsurferATyahooDOTcom
Nov 14 '05 #1
Share this Question
Share on Google+
54 Replies


P: n/a
Ian
Victor Mehta wrote:
would it matter if I decided to make a primary key a string of numbers
vs actual numbers ? would it make any difference to DB2 in terms of
efficiency ? why would you choose numeric over a string for a key that
does not need to be in numeric format ?


Probably the most significant difference at a physical database level is
the fact that INTEGERs require 4 bytes of storage, while you would need
CHAR(10) to store the same range of values.

However, this should never be decided at a physical database level.
Your data model will (should) always determine the proper domain for
your attributes.
Nov 15 '05 #2

P: n/a
Ian wrote:
Victor Mehta wrote:
would it matter if I decided to make a primary key a string of numbers
vs actual numbers ? would it make any difference to DB2 in terms of
efficiency ? why would you choose numeric over a string for a key that
does not need to be in numeric format ?

Probably the most significant difference at a physical database level is
the fact that INTEGERs require 4 bytes of storage, while you would need
CHAR(10) to store the same range of values.

However, this should never be decided at a physical database level.
Your data model will (should) always determine the proper domain for
your attributes.

Indeed! I've just been butting heads with a customer where a "device ID"
which is an "encoding" of "bit ranges" was implemented as INTEGER.
Next thing I hear is "Why doesn't DB2 support UNSIGNED INTEGER because
we loose the 0x8000000 bit."
After some patient explanation and pointing to little and big endian in
a mixed environment they are inching towards CHAR(4) FOR BIT DATA (aka
BINARY) now.
Data types are no afterthought....

Cheers
Serge
--
Serge Rielau
DB2 SQL Compiler Development
IBM Toronto Lab
Nov 15 '05 #3

P: n/a
Ultimately, all data is stored as bits, so the data type INDEXed should
be irrelevant. Personally, i try to use INTEGER as my PRIMARY KEYs,
because i like to refer to TABLEs as objects, and the individual
records as instances of those objects. Having a COLUMN in most every
TABLE called "Id" and having it always be an INTEGER is very convenient
and consistant. Plus, if another TABLE is to REFERENCE this TABLE, a
long character string is not needed, just a (relatively) small INTEGER.

There are other reasons i like it. I find that when text is used for an
identifier, the text may change over time. A non-descript INTEGER
shouldn't and therefore makes an excellent PRIMARY KEY. I'll then make
the character field UNIQUE instead.

So, i simply find the use of an INTEGER to be better fitted.

A string can be used just as well.

B.

Nov 15 '05 #4

P: n/a
Brian Tkatch wrote:
Ultimately, all data is stored as bits, so the data type INDEXed should
be irrelevant.


It depends what is important to you. From a conceptual point you are right.
But practically, there are some differences that might be important.
Storing an integer as char(10) requires 10 bytes. That wastes space in the
index and makes comparison operations slower (byte-wise instead of a single
processor instruction). So the internal representation of the data does
matter.

--
Knut Stolze
DB2 Information Integration Development
IBM Germany
Nov 16 '05 #5

P: n/a
Knut Stolze wrote:
Brian Tkatch wrote:
Ultimately, all data is stored as bits, so the data type INDEXed should
be irrelevant.

It depends what is important to you. From a conceptual point you are right.
But practically, there are some differences that might be important.
Storing an integer as char(10) requires 10 bytes. That wastes space in the
index and makes comparison operations slower (byte-wise instead of a single
processor instruction). So the internal representation of the data does
matter.


And comparing integers gives funny results if the strings are left
justified rather than right justified...

--
Jonathan Leffler #include <disclaimer.h>
Email: jl******@earthlink.net, jl******@us.ibm.com
Guardian of DBD::Informix v2005.02 -- http://dbi.perl.org/
Nov 16 '05 #6

P: n/a
Ah, i see i misread the original question for numric versus
alpha-numeric.

I might as well ask though....
Storing an integer as char(10) requires 10 bytes. That wastes space in the index and
makes comparison operations slower (byte-wise instead of a single processor instruction).


Isn't there something about compacting INDEXes? Or that when most of
the first part of the INDEX values are the same, it really doesn't
matter much when traversing the INDEX?

That is, being numbers are relatively the same:

0001
0002
0003
0004
0005
0006

Here the first three digits do not need to be searched at all since
they are always the same.

B.

Nov 16 '05 #7

P: n/a
Ian
Brian Tkatch wrote:
Ultimately, all data is stored as bits, so the data type INDEXed should
be irrelevant. Personally, i try to use INTEGER as my PRIMARY KEYs,
because i like to refer to TABLEs as objects, and the individual
records as instances of those objects. Having a COLUMN in most every
TABLE called "Id" and having it always be an INTEGER is very convenient
and consistant. Plus, if another TABLE is to REFERENCE this TABLE, a
long character string is not needed, just a (relatively) small INTEGER.


Oy. Data modelers are rolling in their graves. A generic "ID" column
is really, really confusing. At least give it a useful name, like
CUSTOMER_ID or ORDER_ID. What do you do when you need do use the "ID"
column from the CUSTOMER table in the ORDERS table (which already has
an "ID" column as the primary key?)

Nov 16 '05 #8

P: n/a
Jonathan Leffler wrote:
Knut Stolze wrote:
Brian Tkatch wrote:
Ultimately, all data is stored as bits, so the data type INDEXed should
be irrelevant.


It depends what is important to you. From a conceptual point you are
right. But practically, there are some differences that might be
important. Storing an integer as char(10) requires 10 bytes. That
wastes space in the
index and makes comparison operations slower (byte-wise instead of a
single
processor instruction). So the internal representation of the data does
matter.

And comparing integers gives funny results if the strings are left
justified rather than right justified...


Funny results when comparing integers is a minor issue when compared to
the potential problems in data entry. Many years ago, I worked on an
application that was storing (key value) integers in a character field
and discovered that there were a number of unique entries in the
database for key "1". Between left, right, center justification and the
inclusion or lack of leading zeros, it was impossible to locate the
correct data for this key.

Phil Sherman
Nov 16 '05 #9

P: n/a
Actually, it drives me batty when people do that. To me, nearly every
TABLE should have three standard COLUMNs with the *exact* same names,
Id, Name, Description. That way, each TABLE is an object, and each
record is an instantiation of that object. To do it otherwise, to me,
is to completely misunderstand what a database is.

If there is a COLUMN in the customer TABLE called Id, it *by defintion*
is the customer's id. Calling it "customer_id" is redundant.

When a TABLE REFERENCES another TABLE, the _TABLE name_ should be used,
*not* the COLUMN name. Thus, in the orders TABLE, the REFERECEing
COLUMN name should be "customer" not "customer_id".

B.

Nov 16 '05 #10

P: n/a
Brian Tkatch wrote:
Actually, it drives me batty when people do that. To me, nearly every
TABLE should have three standard COLUMNs with the *exact* same names,
Id, Name, Description. That way, each TABLE is an object, and each
record is an instantiation of that object. To do it otherwise, to me,
is to completely misunderstand what a database is.


I see it completely different. If you have a table CUSTOMER, then it should
have a column CUSTOMER_ID and CUSTOMER_NAME and not something abbreviated
where you don't see it origin in a 17-way join. At least that's what I try
to get into the brains of my students: use speaking names - no matter which
programming language you chose.

But hey, that really depends on the coding styles you have to adhere to.

p.s: There are no "objects" in tables; just tuples/rows and values.

--
Knut Stolze
DB2 Information Integration Development
IBM Germany
Nov 21 '05 #11

P: n/a
>If you have a table CUSTOMER, then it should have a column CUSTOMER_ID and
CUSTOMER_NAME and not something abbreviated where you don't see it origin in a
17-way join.
It is never complicated if the COLUMN name is always qualified by it's
TABLE. That is, when querying just the "Customer" TABLE: SELECT Id FROM
Customer; Whereas SELECT CUSTOMER_ID FROM Customer; is both wordy and
redundant, especially when multiple COLUMNs are returned. However,
where there is a JOINed TABLE, the COLUMN should simply be qualified as
Customer.Id. This also ends up being more easily readable than
Customer.Customer_Id.

Further, when a COLUMN from another TABLE REFERENCEs this TABLE, using
the TABLE name in the COLUMN name (as opposed to using as a qualifier)
is frightfully confusing, as one may forget which TABLE the COLUMN
comes from, especially in the aforementioned "17-way join". SELECT
Customer_Id FROM Orders; When added to JOINs it becomes a nightmare to
follow.
But hey, that really depends on the coding styles you have to adhere to.
That much is true. But if someone will have the audacity to complain
that their style is better "Data modelers are rolling in their graves"
when in actuality (according to my understadning) he missed the entire
point of relational databases i will make my comment. :)
p.s: There are no "objects" in tables; just tuples/rows and values.


Perhaps physically, but not theoretcially. For theoretically, the
CREATE TABLE statement is the equivalent of the declaration of an
object in an OO language. Each row is then merely an instantiation of
it. In fact, a database could easily be implemented in that very
fashion.

And, i find that thinking of TABLEs as objects when designing a SCHEMA
for a specific project makes the entire designing process a much easier
task.

B.

Nov 21 '05 #12

P: n/a
Brian Tkatch wrote:
<snip>
Perhaps physically, but not theoretcially. For theoretically, the
CREATE TABLE statement is the equivalent of the declaration of an
object in an OO language. Each row is then merely an instantiation of
it. In fact, a database could easily be implemented in that very
fashion. Now if I were dead, I'd roll in my grave. (of course no one would care ;-)
If what you say were true we'd be posting in the ontos or objectstore
groups right now.. but we are not...

In the OO concept an instance/object has an OID.
In relational the purist selects a natural key. That is a key made from
properties of the rows in the table.
And that's pretty much the rub: abtract vs. natural keys.
The art is to find such a natural key. The point being made by
relational people is that your existing business process already
provides such a natural key.
E.g. in IBM an employee number is 6 digits plus 3 digits country code.
A department number, on the other hand, is 3-4 characters.
Your car has serial number - which is far from random, your social
ensurance number is not random either.
A relation design is modeled after the business.
There is no such thing a global unique ID in the real world.
(not to judge.. it's just the way it is ...)

Naming the customer ID customerID typically is not to avoid adding the
schema name, it's to distinguish it from the transactionID in the
transaction table where it acts as a child to the customer table in an
RI constraint.

In a BI world it can be beneficial for performance reasons to map
natural keys to abstract keys. That's where pragmatic reasons trump
realtional design. In OLTP that should hardly be the case.
Purely statistically speaking I see IDENTITY columns (the trademark of
abstract keys) most often in designs which underutilize the RDBMS,
degrading it to a persistent data store.
And, i find that thinking of TABLEs as objects when designing a SCHEMA
for a specific project makes the entire designing process a much easier
task.

You mean classes right?

Cheers
Serge
--
Serge Rielau
DB2 SQL Compiler Development
IBM Toronto Lab
Nov 23 '05 #13

P: n/a
" In a BI world it can be beneficial for performance reasons to map
natural keys to abstract keys. That's where pragmatic reasons trump
realtional design. In OLTP that should hardly be the case.
"

Why always this desire to split BI and OLTP and being scared of natural
keys?
From C.J. Date Relational database Writings 1989-1991, Article:

Composite Keys: page 473:

"
The examples discussed in this article seem to suggest that composite
keys are acceptable, and even desirable, for characteristics and
associations.
"
Bernard Dhooghe

Nov 23 '05 #14

P: n/a
Bernard Dhooghe wrote:
" In a BI world it can be beneficial for performance reasons to map
natural keys to abstract keys. That's where pragmatic reasons trump
realtional design. In OLTP that should hardly be the case.
"
Why always this desire to split BI and OLTP and being scared of natural
keys? I think the issue is size. In a star schema surrogate keys can greatly
reduce the size of the table. The ain is simply bigger. I'm not stating
preference, just what I see.
From C.J. Date Relational database Writings 1989-1991, Article:

Composite Keys: page 473:

"
The examples discussed in this article seem to suggest that composite
keys are acceptable, and even desirable, for characteristics and
associations.
"

OK.. that speaks for natural keys...

Cheers
Serge

--
Serge Rielau
DB2 SQL Compiler Development
IBM Toronto Lab
Nov 23 '05 #15

P: n/a
>> Having a COLUMN in most every TABLE called "Id" and having it always be an INTEGER is very convenient and consistant. <<

Only if you do not know anything about data modeling and believe in
magical, universal identifiers. Do you also think that god put a
magical 17 digit Hebrew on everything in Creation? Not only is the
idea wrong, the execution is too vague to pass ISO-11179 rules -- it
begs the question "identifier of what??" -- to be is to be something in
particular (freshman logic? Aristotle?).

What you are doing manually with redundant data is mimicking a
sequential file recored number to get an exposed physical locator.
But then you also do not know that a column is not a field, so no
wonder you design things this way.

Nov 23 '05 #16

P: n/a
Okay, you are shoving an OO model into SQL, along with 1950's tape file
designs.

Let's get back to the basics of an RDBMS. Rows are not records; fields
are not columns; tables are not files. Data and metadata are never
mixed.
If there is a COLUMN in the customer TABLE called Id, it *by defintion* is the customer's id. Calling it "customer_id" is redundant. <<


Only if you design one table (file) at a time; if you were designed an
RDBMS, the data elements would be named for the thing they model at the
schema level. This is fundamental data modeling, not the tricky stuff.

Nov 23 '05 #17

P: n/a
Serge, thanx for the reply. I appreciate the thoughts.
And that's pretty much the rub: abtract vs. natural keys.
I use abstract keys for TABLEs that define objects, but use natural
when hashing two things together to form a unique join. That is,
Customer and Item will get their own Ids, but Order may use Customer,
Item, and a DATETIME field for its PK.

The reasoning is, that an object is defined by its instantiation, thus
the information is irrelevant to the object. That is, it simply does
not matter what the business's or employee's name is (they may even
change!) the only consistant thing is this row, and therefore it must
have an identifier.

An order (in this example), however, is defined by the association of
two defined objects at a specific time. They both define it, and are
unique to it. In that case an Identifier would simply be redundant.
You mean classes right?


Could be. I admit i do not have all my OO terms down. I mean the thing
that is used to instantiate a new object. Um.. a TABLE, yeah, that's
it. :)

B.

Nov 23 '05 #18

P: n/a
Joe, you do nothing but insult people. Please go away.

B.

Nov 23 '05 #19

P: n/a
Brian Tkatch wrote:
Serge, thanx for the reply. I appreciate the thoughts.

And that's pretty much the rub: abtract vs. natural keys.

I use abstract keys for TABLEs that define objects, but use natural
when hashing two things together to form a unique join. That is,
Customer and Item will get their own Ids, but Order may use Customer,
Item, and a DATETIME field for its PK.
The reasoning is, that an object is defined by its instantiation, thus
the information is irrelevant to the object. That is, it simply does
not matter what the business's or employee's name is (they may even
change!) the only consistant thing is this row, and therefore it must
have an identifier.

OK, now we're talking. At the end of the day the question is:
Where do these transaction IDs, customer IDs, employee IDs, etc.. come
from? If they come from some process then so be it. But if they don't
then the database is as good a place to generate these IDs as any other
place.
You are correct that some "natural keys" which may offer themselves
aren't all that good. I recall the mess I got into when my SIN changed
after I immigrated. My credit rating was good enough to by a house yet
not good enough to put a washing machine inside ;-)
And it's doubtful customers will surrender their finger prints or DNA as
a natural customer ID.

So, I think it's important to watch for those keys that are provided and
use them, but if none are provided who's to blame the DBMS to fill the
gap....?
Note that this is very different from: "All my tables have an identity
columns".

Cheers
Serge
--
Serge Rielau
DB2 SQL Compiler Development
IBM Toronto Lab
Nov 23 '05 #20

P: n/a
Ian
Brian Tkatch wrote:
That much is true. But if someone will have the audacity to complain
that their style is better "Data modelers are rolling in their graves"
when in actuality (according to my understadning) he missed the entire
point of relational databases i will make my comment. :)

Brian,

Maybe making the comment about modelers rolling in their grave was a bit
harsh (sorry), but there is a big difference between design and style.

Design involves understanding the business significance of an attribute
and applying that understanding when creating modeling your business.
Style is coming up with names for your attributes, once you understand
what they are. I realize that this sounds like "calling the attribute
ID or CUSTOMER_ID is just style," but bear with me:

Naming 6 *different* attributes "ID" creates confusion, because when
you do this, context is *required* to understand which "ID" attribute
you are referring to.

More confusion arises when an attribute has a different name depending
on it's context (i.e. the attribute is called "ID" in the CUSTOMER
entity, but "CUSTOMER_ID" in the ORDER entity).

Granted, it can be tediou for DBAs and application developers when they
have to type out "CUSTOMER_ID" or "CUSTOMER.CUSOMTER_ID" in application
SQL or application code, but having an open, easy-to-understand data
model will help your business when a change happens or someone else has
to interpret the model.




Nov 23 '05 #21

P: n/a
>Note that this is very different from: "All my tables have an identity columns".

True. Though, that isn't what i said. What i did say was:

"To me, nearly every TABLE should have three standard COLUMNs with the
*exact* same names, Id, Name, Description. That way, each TABLE is an
object, and each record is an instantiation of that object."

The words "nearly every" were misleading, but that is true with the DB
i'm working with right now. I said them more to make the point to the
post i was replying to. The qualifier should have been "each TABLE that
defines an object".

B.

Nov 23 '05 #22

P: n/a
>Maybe making the comment about modelers rolling in their grave was a bit harsh

No, it wasn't. It was not meant that way. You meant it to make your
point, and that is fine.
Design involves understanding the business significance of an attribute
and applying that understanding when creating modeling your business.
Design is the understanding from the requirements what the
(theoretical) model they are trying to use is, and them implementing
it. Designing directly to the requirements is a poor choice, as it
misses the actual model, and falls short when requirements change.
Style is coming up with names for your attributes, once you understand what they are. I realize
that this sounds like "calling the attribute ID or CUSTOMER_ID is just style," but bear with me:
Style is the vantage point of the designer when naming the TABLEs, such
as if it is called "Customer" or "Customers". The actual name of the
TABLE should be self-evident from the object, relation, or event the
TABLE is defining.
Naming 6 *different* attributes "ID" creates confusion, because when you do this,
context is *required* to understand which "ID" attribute you are referring to.
Withall due respect, context is always required. Naming a column with
the TABLE name to get around providing context is poor design.
More confusion arises when an attribute has a different name depending on it's context (i.e. the
attribute is called "ID" in the CUSTOMER entity, but "CUSTOMER_ID" in the ORDER entity).
Here at least i agree you have a point. :)

However....

The child TABLE should have the TABLE name as the COLUMN. That is, in
the Customer TABLE it is called "Id" but in the Order TABLE it is
called "Customer". This works well with the default behavior of a
FOREIGN KEY in that it automatically REFERENCES the PRIMARY KEY. If the
PK on the parent object (referred to TABLEs seem to define objects) is
Id, it makes it even easier. This way, the child TABLEneed not know
anything about the parent TABLE.
Granted, it can be tediou for DBAs and application developers when they have to type out
"CUSTOMER_ID" or "CUSTOMER.CUSOMTER_ID" in application SQL or application code
Oh please... :)

The point is not that it is extra typing, but that it is redundant. The
entire point of normalization is to remove redundancy. From that
mindset how could one name the COLUMNs in redundant fashion?
, but having an open, easy-to-understand data model will help your business when
a change happens or someone else has to interpret the model.


Which is even easier when using Id, Name, and Description in all object
TABLEs.

B.

Nov 23 '05 #23

P: n/a
Brian Tkatch wrote:
p.s: There are no "objects" in tables; just tuples/rows and values.
Perhaps physically, but not theoretcially. For theoretically, the
CREATE TABLE statement is the equivalent of the declaration of an
object in an OO language. Each row is then merely an instantiation of
it. In fact, a database could easily be implemented in that very
fashion.


Grab a good book on the relational data model and read it.

There are NO objects in SQL. Everything is based on values.
And, i find that thinking of TABLEs as objects when designing a SCHEMA
for a specific project makes the entire designing process a much easier
task.


Q: define the term "object"

I said "there are no objects _in_ tables," explicitly avoiding this very
issue. That's because I too like to talk about tables as "objects", which
implies being terribly vague...

--
Knut Stolze
DB2 Information Integration Development
IBM Germany
Nov 23 '05 #24

P: n/a
Brian Tkatch wrote:
the only consistant thing is this row, and therefore it must
have an identifier.


Internally yes, there is a row-id, RID, record-id, tuple-id, TID, or
whatever else you like to call it. Externally, you don't have such an ID
at all in SQL - note the multi-set semantics that come with SQL.

The relational model defines a set semantics. If you do not have an
explicit key defined (pollibly projected it away), then all attributes of
the relation form the key.

--
Knut Stolze
DB2 Information Integration Development
IBM Germany
Nov 23 '05 #25

P: n/a
Ian wrote:
Granted, it can be tedious for DBAs and application developers when they
have to type out "CUSTOMER_ID" or "CUSTOMER.CUSOMTER_ID" in application


Actually, that argument doesn't hold any water in the first place. It will
take longer to think about coding the query than to actually type it in.
So you can't really complain about "lost time".

--
Knut Stolze
DB2 Information Integration Development
IBM Germany
Nov 23 '05 #26

P: n/a
The problem with this is you can have columns with similar data within
a table. A customer table may have a home address field with a state
code as well as a bill-to address with a state field or a record may
contain a person's data of birth and date of death. You can't call
both fields 'state' or 'date'. You have to be clearer.

Also, with several tables having an 'ID' column but of different types
(machine.id, customer.id, location.id, etc) and with the tendency for
SQL programmers to use single letters as table aliases, you can have
select that looks like "select a.id, b.id, c.id, d.id,.... from
customer a, machine b, location c, ... where a.id <> b.id and c.id =
d.id ....". especially in a complex 17 table join with subqueries it
could take a new programmer days just to determine what is going on
with hundreds or thousands of lines of SQL code written this way. I
think the extra time it takes to make something clearer saves time and
effort later on when modifications have to be made.

Nov 23 '05 #27

P: n/a
>If you do not have an explicit key defined (pollibly projected it away), then all attributes of the relation form the key.

In the Customer TABLE, for example, the specifics do not form the key.
The fact that there is a row there is what forms the key. In fact, the
Customer is welcome to have two separate accounts, in which case, every
single piece of information is the same beside an added identifier.

As it just so happens, the Customer is *usually* unique, and because of
that a business rule--not a data rule--forces the name to be unique.
And, it is my contention, that the PK should never be based on a
business rule.

B.

Nov 23 '05 #28

P: n/a
Brian Tkatch wrote:
If you do not have an explicit key defined (pollibly projected it away),
then all attributes of the relation form the key.


In the Customer TABLE, for example, the specifics do not form the key.
The fact that there is a row there is what forms the key. In fact, the
Customer is welcome to have two separate accounts, in which case, every
single piece of information is the same beside an added identifier.


I was talking about the relational model - not what SQL made of it. There
is a big difference between both!

--
Knut Stolze
DB2 Information Integration Development
IBM Germany
Nov 23 '05 #29

P: n/a
Doesn't the relational model rely on certain objects (TABLEs) being
defined in order to make the relations in between them?

B.

Nov 23 '05 #30

P: n/a
Heh.

Good point Knut. :)

B.

Nov 23 '05 #31

P: n/a
Brian Tkatch wrote:
Doesn't the relational model rely on certain objects (TABLEs) being
defined in order to make the relations in between them?


There are no tables in RM. Just relations that have attributes and a set of
operations that can be applied to the relations, i.e. joins, selection and
projection. That's it.

--
Knut Stolze
DB2 Information Integration Development
IBM Germany
Nov 23 '05 #32

P: n/a
>A customer table may have a home address field with a state
code as well as a bill-to address with a state field or a record may
contain a person's data of birth and date of death. You can't call
both fields 'state' or 'date'. You have to be clearer.
A) I only said this regarding Id, Name, and Description.
B) State is the customer's state, Bill_To_State is the bill to state. I
don't see an issue.
and with the tendency for SQL programmers to use single letters as table aliases,
A) A database design should not be destoyed just so some bad programmer
can do a bad job.
B) A database design should be made before any compromises are made.
The design allows for many COLUMNs called "Id", should you want to
compromise afterwards for the programmer that's okey, but that by no
means is part of the model.
especially in a complex 17 table join with subqueries it could take a new programmer days just to determine what is going on


A) If you have a 17-way join where the aliases are all one letter, you
should fire all your programmers and get people who think.
B) It should take a new programmer "days" to figure out a new query
that contains a 17-way join. If he says he understands this new join
early on, he's pulling your leg.

B.

Nov 23 '05 #33

P: n/a
OK, if that is the relational model, then i admit, i know nothing about
the relational model. (And am quite happy about that too.)

B.

Nov 23 '05 #34

P: n/a
Brian Tkatch wrote:
OK, if that is the relational model, then i admit, i know nothing about
the relational model. (And am quite happy about that too.)


The relational model provides the mathematical foundation for SQL. So I
would say it's an extremely good thing if all folks working with relational
database systems would know about it. Oh well, we don't live in
paradise...

--
Knut Stolze
DB2 Information Integration Development
IBM Germany
Nov 23 '05 #35

P: n/a
Brian Tkatch wrote:
especially in a complex 17 table join with subqueries it could take a new
programmer days just to determine what is going on


A) If you have a 17-way join where the aliases are all one letter, you
should fire all your programmers and get people who think.
B) It should take a new programmer "days" to figure out a new query
that contains a 17-way join. If he says he understands this new join
early on, he's pulling your leg.


Ah, but that's the point: if it takes "days", something wasn't done quite
optimal before.

--
Knut Stolze
DB2 Information Integration Development
IBM Germany
Nov 23 '05 #36

P: n/a
Brian wrote: <The point is not that it is extra typing, but that it is
redundant. The
entire point of normalization is to remove redundancy. From that
mindset how could one name the COLUMNs in redundant fashion?>

When the RM talks about removing redundancy, it's talking about
removing redundant data withing the tables. It is not refering to the
table column names at all, just the data within the tables.

Patrick

Nov 23 '05 #37

P: n/a
>Ah, but that's the point: if it takes "days", something wasn't done quite optimal before

It was hiring those bad programmers. :)

Assuming:

1) A "17 way join" is actually required.
2) This is a new join to this programmer.

Then it should take a while. Granted, the original design may be poor,
but that has nothing to do with the aliases. Even a well commented
query split into the individual sets of WHERE clauses that effect the
join, to understand the relations between so many TABLEs (when the
relation is new to the programmer) is going to take a while.

B.

Nov 23 '05 #38

P: n/a
Brian Tkatch wrote:
Ah, but that's the point: if it takes "days", something wasn't done quite
optimal before


It was hiring those bad programmers. :)

Assuming:

1) A "17 way join" is actually required.
2) This is a new join to this programmer.

Then it should take a while. Granted, the original design may be poor,
but that has nothing to do with the aliases. Even a well commented
query split into the individual sets of WHERE clauses that effect the
join, to understand the relations between so many TABLEs (when the
relation is new to the programmer) is going to take a while.


Well, have it your way.

My experience is that I have a much easier time understanding code (SQL or
other programming languages) if the names of variables are self-describing.

--
Knut Stolze
DB2 Information Integration Development
IBM Germany
Nov 23 '05 #39

P: n/a
>Well, have it your way.

I will, thank you. :)

And you may have it yours. BTW, i enjoy reading what you write here. I
learn quite a bit by reading your's and Serge's responses.
My experience is that I have a much easier time understanding code (SQL or
other programming languages) if the names of variables are self-describing.


I believe in that, but i also demand that the user check for context.
So, whilst i use descriptive names, i do not provide context in the
name itself, that is what the TABLE name is for.

Variables, however, which do not have such context, is something i
usually give longer names to, complete with underscores (and Mixed
Case) for easy readability.

My general feeling is, if the person doesn't know the context, i don't
want him touching anything anyway.

B.

Nov 23 '05 #40

P: n/a
Brian Tkatch wrote:
And you may have it yours. BTW, i enjoy reading what you write here. I
learn quite a bit by reading your's and Serge's responses.


Thanks. I thought I might have been, err... too direct. I did not intent
to insult you in any way. If I did, you have my sincere apologies.

--
Knut Stolze
DB2 Information Integration Development
IBM Germany
Nov 23 '05 #41

P: n/a
No insults at all. I was just repeating your comment. :)

I don't usually find people insulting. Just certain individuals seem to
have a bad attitude.

B.

Nov 23 '05 #42

P: n/a
No, I educate them if they will listen and do a minimal amount of
research. Unfortunately, we have a lot of programmers who only want
kludges and for their "wish magic" to change the nature of data for
them.

Go read some RDBMS classics and learn why what you are doing is
fundamentally wrong. And quit whining about your feelings; watch a
episode of HOUSE and see where I can coming from.

Nov 25 '05 #43

P: n/a
Ian wrote:
Design involves understanding the business significance of an attribute
and applying that understanding when creating modeling your business.
Style is coming up with names for your attributes, once you understand
what they are. I realize that this sounds like "calling the attribute
ID or CUSTOMER_ID is just style," but bear with me:


Another interesting consideration that has emerged is the use of the
Ruby on Rails framework. This framework drives application behavior
automatically from the data model - as long as you follow certain
conventions, such as:

- table names use plural: cars, books, etc
- table names have auto incrementing primary keys
- primary keys are named just: id
- foreign keys names also carry info about relationship type (1:many,
etc), so a foreign key from the stories table to the authors table
would be named author_id

This is not how I'd usually create a model - I prefer entity names to
be singular, primary keys to have entity name in them (or shortened
version anyway). and prefer primary & foreign keys to have same name

But Ruby on Rails is probably the coolest application framework I've
seen to date, and has enough going for it that I'm willing to overlook
the modeling issues. Or theoretically create a set of ROR specific
views to make it work with models that don't comply with its
conventions.

Nov 28 '05 #44

P: n/a
In OO terms the creation of a table is the instantiation of an object
since the table contains both data and methods (constraints). Adding
another row to the table only adds more data to an existing object. It
is the table that has the methods, not the individual rows.

Patrick

Nov 28 '05 #45

P: n/a
Hi!

I our database EVERY table has an ID that is BIGINT and get's generated
using a SEQUENCE. And EVERY table also has it's own natural key.
We went with this approach because all the JOINS are so damn easier to write
with IDs in place instead of being forced to use natural keys over and over
again.

Best regards,
Kovi

Serge Rielau wrote:
Brian Tkatch wrote:
Serge, thanx for the reply. I appreciate the thoughts.

And that's pretty much the rub: abtract vs. natural keys.

I use abstract keys for TABLEs that define objects, but use natural
when hashing two things together to form a unique join. That is,
Customer and Item will get their own Ids, but Order may use Customer,
Item, and a DATETIME field for its PK.
> The reasoning is, that an object is defined by its instantiation, thus
the information is irrelevant to the object. That is, it simply does
not matter what the business's or employee's name is (they may even
change!) the only consistant thing is this row, and therefore it must
have an identifier.

OK, now we're talking. At the end of the day the question is:
Where do these transaction IDs, customer IDs, employee IDs, etc.. come
from? If they come from some process then so be it. But if they don't
then the database is as good a place to generate these IDs as any other
place.
You are correct that some "natural keys" which may offer themselves
aren't all that good. I recall the mess I got into when my SIN changed
after I immigrated. My credit rating was good enough to by a house yet
not good enough to put a washing machine inside ;-)
And it's doubtful customers will surrender their finger prints or DNA as
a natural customer ID.

So, I think it's important to watch for those keys that are provided and
use them, but if none are provided who's to blame the DBMS to fill the
gap....?
Note that this is very different from: "All my tables have an identity
columns".

Cheers
Serge


Nov 30 '05 #46

P: n/a
Gregor Kovac wrote:
Hi!

I our database EVERY table has an ID that is BIGINT and get's generated
using a SEQUENCE. And EVERY table also has it's own natural key.
We went with this approach because all the JOINS are so damn easier to write
with IDs in place instead of being forced to use natural keys over and over
again.

Couple of comments:
1. How do you enforce the functional dependency?
2. How do you index? I presume your app wants to see the natural key,
yet your joins operate on the abstract key. That means that you end up
with ISCAN-FETCH combinations instead of pure ISCAN on covering indexes.
You have traded lazy fingers for performance. Assuming you buy your
hardware from IBM I'm OK with that :-)
There is a point to be made for "NATURAL" JOINs at this point though (A
feature not supported by DB2). NATURAL JOINS require a good column
naming methodology though (same column names for foreign and primary key)

Cheers
Serge
--
Serge Rielau
DB2 SQL Compiler Development
IBM Toronto Lab
Nov 30 '05 #47

P: n/a
Hi!

1.) We create unique indexes on natural keys. I hope I understood you OK.
2.) We have two kinds of indexes: primary key has one and natural key is one
index. Then we add more when required for performance reasons.

Yes, we are ALL-IBM shop :) And yes, I've seen a lot of those INDEX SCAN +
FETCH items in explains. Hmm.. Is there a big difference in performance
between INDEX SCAN + FETCH as opposed to only INDEX SCAN? If there is then
I'll have to convince my coworkers not to be so lazy. :)

Best regards,
Kovi

Serge Rielau wrote:
Gregor Kovac wrote:
Hi!

I our database EVERY table has an ID that is BIGINT and get's generated
using a SEQUENCE. And EVERY table also has it's own natural key.
We went with this approach because all the JOINS are so damn easier to
write with IDs in place instead of being forced to use natural keys over
and over again.

Couple of comments:
1. How do you enforce the functional dependency?
2. How do you index? I presume your app wants to see the natural key,
yet your joins operate on the abstract key. That means that you end up
with ISCAN-FETCH combinations instead of pure ISCAN on covering indexes.
You have traded lazy fingers for performance. Assuming you buy your
hardware from IBM I'm OK with that :-)
There is a point to be made for "NATURAL" JOINs at this point though (A
feature not supported by DB2). NATURAL JOINS require a good column
naming methodology though (same column names for foreign and primary key)

Cheers
Serge


Nov 30 '05 #48

P: n/a
>Is there a big difference in performance between
INDEX SCAN + FETCH as opposed to only INDEX SCAN?


[I'll give it a shot.]

Absolutely.

If all the information is on the INDEX, there is no reason to look at
the row for the data. If the information is not on the INDEX, it first
uses the INDEX to find what records it needs, it then grabs row
addresses from the INDEX and has to go (once again) to the disk for the
actual values. So, there's more I/O, and that always adds time.

If the TABLE is completely stored in memory, the impact (in this
specific operation) is mitigated.

--

Anyway, don't take my word for it. Nothing makes the point like trying
it yourself. For an experiment, write a query that uses the INDEX to
get the values of a COLUMN not in the INDEX. Then, CREATE another INDEX
with the searched COLUMN and the COLUMN you want to grab. The timing on
the second should be much less than the first.

B.

Nov 30 '05 #49

P: n/a
Serge,
Couple of comments:
1. How do you enforce the functional dependency?
Hmm, I've never seen anyone attempt to enforce functional dependency
between an artificial & natural key. Is there a method that you had in
mind for this? Note, that I haven't found this to be a serious source
of data quality problems either tho...
2. How do you index? I presume your app wants to see the natural key,
yet your joins operate on the abstract key. That means that you end up
with ISCAN-FETCH combinations instead of pure ISCAN on covering indexes.
You have traded lazy fingers for performance.


Is it really that bad? Many of the queries that I have that use
artificial ids also refer to non-key attributes in the query anyway.
These wouldn't benefit from the natural key, but the index io certainly
must benefit from use of a 4-byte smallint vs composit key with
multiple large varchars.

Plus, these days labor costs so much more than hardware, and business
rules change so fast, that the future-proofing you get from artificial
keys can be worth any theoretical performance problem. I know one
system that has business rule changes every month - as new customers
with slightly different needs are added, new source systems are fed
into it, etc. Assumptions made about natural keys for one set of data
often fails when the next set is added. This is a data warehouse,
which is much more susceptible to this kind of a problem than a small
OTLP database though.

Nov 30 '05 #50

54 Replies

This discussion thread is closed

Replies have been disabled for this discussion.