By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,222 Members | 1,728 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,222 IT Pros & Developers. It's quick & easy.

Updating the SQL key value

P: n/a
In an application I am writing the user can define a series of
steps to be followed. I save them in a sql database using the
field "order" (a smallint) as the primary key.
(there are in the range of 20 steps)

On the admin page the steps are listed, in "order" order and the
user can create new steps and assign an order and all is well.

The problem may come in using a renumber function which should
take the steps in their current order and reassign the "order"
key, assigning each set an "order" that is the prior "order" + 10.

In other databases I have worked with this is a major pain
because as soon as you save a record the order may change.
OTOH, in mysql it appears that if you do a select for the whole
table you have them all in memory and can change the value of the
field "order" without having to worry about getting that record
again and then save the whole table.

Is that correct ?

bill
Mar 28 '07 #1
Share this Question
Share on Google+
33 Replies


P: n/a
bill wrote:

Hi Bill,
In an application I am writing the user can define a series of
steps to be followed. I save them in a sql database using the
field "order" (a smallint) as the primary key.
(there are in the range of 20 steps)

On the admin page the steps are listed, in "order" order and the
user can create new steps and assign an order and all is well.

The problem may come in using a renumber function which should
take the steps in their current order and reassign the "order"
key, assigning each set an "order" that is the prior "order" + 10.
Well, what about:

UPDATE tblwhatever set order = order+10;

>
In other databases I have worked with this is a major pain
because as soon as you save a record the order may change.
I never heard of a database that changes the values of other rows if you
insert a new row.
Are you maybe using this order filed as Primary Key?

If so: That is a major designmistake.
(I'll elaborate if this turns out to be the case.)
OTOH, in mysql it appears that if you do a select for the whole
table you have them all in memory and can change the value of the
field "order" without having to worry about getting that record
again and then save the whole table.

Is that correct ?
No.

A query is a query: something that return results (or none).
If you SELECT something, nothing is changed.

I don't understand what you mean by 'selecting the whole table in memory'.
If you need to UPDATE certain records, use the UPDATE command to do so.
>
bill
Regards,
Erwin Moller
Mar 28 '07 #2

P: n/a
bill wrote:
In an application I am writing the user can define a series of steps to
be followed. I save them in a sql database using the field "order" (a
smallint) as the primary key.
(there are in the range of 20 steps)

On the admin page the steps are listed, in "order" order and the user
can create new steps and assign an order and all is well.

The problem may come in using a renumber function which should take the
steps in their current order and reassign the "order" key, assigning
each set an "order" that is the prior "order" + 10.

In other databases I have worked with this is a major pain because as
soon as you save a record the order may change.
OTOH, in mysql it appears that if you do a select for the whole table
you have them all in memory and can change the value of the field
"order" without having to worry about getting that record again and then
save the whole table.

Is that correct ?

bill
Bill,

Try asking in comp.databases.mysql. This is a PHP newsgroup.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================
Mar 28 '07 #3

P: n/a
Erwin Moller wrote:
bill wrote:

Hi Bill,
>In an application I am writing the user can define a series of
steps to be followed. I save them in a sql database using the
field "order" (a smallint) as the primary key.
(there are in the range of 20 steps)

On the admin page the steps are listed, in "order" order and the
user can create new steps and assign an order and all is well.

The problem may come in using a renumber function which should
take the steps in their current order and reassign the "order"
key, assigning each set an "order" that is the prior "order" + 10.

Well, what about:

UPDATE tblwhatever set order = order+10;
I think I was unclear:
the order might evolve to:

10
12
16
18
30
33

and after renumbering it should be back to
10
20
30
40
50
60
>
>In other databases I have worked with this is a major pain
because as soon as you save a record the order may change.

I never heard of a database that changes the values of other rows if you
insert a new row.
Are you maybe using this order filed as Primary Key?

If so: That is a major designmistake.
(I'll elaborate if this turns out to be the case.)
guilty.
That is why the updating of the "order" would result in the
problem.

If I use another primary key (an arbitrary ID) then I still have
to read the records in "order" number and reassign the value of
"order" and then read the next record.
>
>OTOH, in mysql it appears that if you do a select for the whole
table you have them all in memory and can change the value of the
field "order" without having to worry about getting that record
again and then save the whole table.

Is that correct ?

No.
bummer
>
A query is a query: something that return results (or none).
If you SELECT something, nothing is changed.

I don't understand what you mean by 'selecting the whole table in memory'.
If you need to UPDATE certain records, use the UPDATE command to do so.
but after having UPDATEd, one still has the problem of reading
the next record
>
>bill

Regards,
Erwin Moller
your assistance is appreciated

bill
Mar 29 '07 #4

P: n/a
bill wrote:
Erwin Moller wrote:
>bill wrote:

Hi Bill,
>>In an application I am writing the user can define a series of
steps to be followed. I save them in a sql database using the
field "order" (a smallint) as the primary key.
(there are in the range of 20 steps)

On the admin page the steps are listed, in "order" order and the
user can create new steps and assign an order and all is well.

The problem may come in using a renumber function which should
take the steps in their current order and reassign the "order"
key, assigning each set an "order" that is the prior "order" + 10.

Well, what about:

UPDATE tblwhatever set order = order+10;

I think I was unclear:
the order might evolve to:

10
12
16
18
30
33

and after renumbering it should be back to
10
20
30
40
50
60
Aha. I see now what you mean.
Hmm, that is tricky to do with 1 UPDATE statement since you didn't use a
'normal' autonumbering Primary Key.
I think you have to resort to a simple script to do that for you.

Can we assume that you NEVER have any double values for 'order'?
I mean, did you make that field PK or at least UNIQUE? Or as a last resort,
check before inserting that the 'order' didn't exist yet?

If they are all UNIQUE try something like:
-- pseudocode ADODB-like, adjust to your own prefered databaselogic

$SQL = "SELECT order FROM tblorder ORDER BY order";
$result = $connection->Execute($SQL)->getArray();

$count = 10;
foreach ($result as $oneOrder){
$oldOrderNum = $oneOrder["order"];
// Update
$SQL = "UPDATE tblorder SET order=".$count;
$SQL .= " WHERE (order=".$oldOrderNum.");";
$connection->Execute($SQL);
}
>>
>>In other databases I have worked with this is a major pain
because as soon as you save a record the order may change.

I never heard of a database that changes the values of other rows if you
insert a new row.
Are you maybe using this order filed as Primary Key?

If so: That is a major designmistake.
(I'll elaborate if this turns out to be the case.)

guilty.
That is why the updating of the "order" would result in the
problem.

If I use another primary key (an arbitrary ID) then I still have
to read the records in "order" number and reassign the value of
"order" and then read the next record.
In this case: maybe.

But take it from an old db-fart like me: Start using an autonumbering PK for
each table you create from now on, unless you have a compelling reason not
to. I actually was never in a situation where an autonumbering PK hurts.
Only in some cases you really know it doesn't make sense (scratch tables,
temp tables, etc).
As a rule of thumb: Use autonumbering PK on every table.

In my above example I wouldn't need to be afraid you have some values for
'order' that are the same (which would make that approach fail).
A autonumbering PK would have eliminated that problem.

>>
>>OTOH, in mysql it appears that if you do a select for the whole
table you have them all in memory and can change the value of the
field "order" without having to worry about getting that record
again and then save the whole table.

Is that correct ?

No.

bummer
Unless you mean my above script. That takes the all 'order' values from
tblorder in memory, but that is independent from the database, it is PHP's
memory.
>>
A query is a query: something that return results (or none).
If you SELECT something, nothing is changed.

I don't understand what you mean by 'selecting the whole table in
memory'. If you need to UPDATE certain records, use the UPDATE command to
do so.

but after having UPDATEd, one still has the problem of reading
the next record
Not if you read them all at once first into memory of php and start updating
then.
>>
>>bill

Regards,
Erwin Moller

your assistance is appreciated
You're welcome
bill
Regards,
Erwin Moller
Mar 29 '07 #5

P: n/a
bill wrote:
The problem may come in using a renumber function which should
take the steps in their current order and reassign the "order"
key, assigning each set an "order" that is the prior "order" + 10.
Something like:

<?php
$db = new PDO(/* connection settings */);
$db->query("ALTER TABLE foobar ADD tempcol integer;");
$update = $db->prepare("UPDATE foobar SET tempcol=? WHERE order=?;");
$count = 0;
foreach ($db->query("SELECT order FROM foobar ORDER BY order;") as $row)
{
$count+=10;
$update->execute($count, $row['order']);
}
$db->query("UPDATE foobar SET order=tempcol;");
$db->query("ALTER TABLE foobar DROP tempcol;");
unset($count, $update);
?>

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact
Geek of ~ HTML/SQL/Perl/PHP/Python*/Apache/Linux

* = I'm getting there!
Mar 29 '07 #6

P: n/a
On Mar 28, 3:50 pm, bill <nob...@spamcop.netwrote:
In other databases I have worked with this is a major pain
because as soon as you save a record the order may change.
OTOH, in mysql it appears that if you do a select for the whole
table you have them all in memory and can change the value of the
field "order" without having to worry about getting that record
again and then save the whole table.
I am working on similar application, where I have steps in a certain
business process, but I don't use order number as a primary key.

I don't think that you shoud rely on such MySQL feature,
because it could change in next release.

My advice is to always repeat the query after renumbering.

DG

Mar 29 '07 #7

P: n/a
Erwin Moller wrote:
bill wrote:
>Erwin Moller wrote:
>>bill wrote:

Hi Bill,

In an application I am writing the user can define a series of
steps to be followed. I save them in a sql database using the
field "order" (a smallint) as the primary key.
(there are in the range of 20 steps)

On the admin page the steps are listed, in "order" order and the
user can create new steps and assign an order and all is well.

The problem may come in using a renumber function which should
take the steps in their current order and reassign the "order"
key, assigning each set an "order" that is the prior "order" + 10.
Well, what about:

UPDATE tblwhatever set order = order+10;
I think I was unclear:
the order might evolve to:

10
12
16
18
30
33

and after renumbering it should be back to
10
20
30
40
50
60

Aha. I see now what you mean.
Hmm, that is tricky to do with 1 UPDATE statement since you didn't use a
'normal' autonumbering Primary Key.
I think you have to resort to a simple script to do that for you.

Can we assume that you NEVER have any double values for 'order'?
I mean, did you make that field PK or at least UNIQUE? Or as a last resort,
check before inserting that the 'order' didn't exist yet?

If they are all UNIQUE try something like:
-- pseudocode ADODB-like, adjust to your own prefered databaselogic

$SQL = "SELECT order FROM tblorder ORDER BY order";
$result = $connection->Execute($SQL)->getArray();

$count = 10;
foreach ($result as $oneOrder){
$oldOrderNum = $oneOrder["order"];
// Update
$SQL = "UPDATE tblorder SET order=".$count;
$SQL .= " WHERE (order=".$oldOrderNum.");";
$connection->Execute($SQL);
}
>>>
In other databases I have worked with this is a major pain
because as soon as you save a record the order may change.
I never heard of a database that changes the values of other rows if you
insert a new row.
Are you maybe using this order filed as Primary Key?

If so: That is a major designmistake.
(I'll elaborate if this turns out to be the case.)
guilty.
That is why the updating of the "order" would result in the
problem.

If I use another primary key (an arbitrary ID) then I still have
to read the records in "order" number and reassign the value of
"order" and then read the next record.

In this case: maybe.

But take it from an old db-fart like me: Start using an autonumbering PK for
each table you create from now on, unless you have a compelling reason not
to. I actually was never in a situation where an autonumbering PK hurts.
Only in some cases you really know it doesn't make sense (scratch tables,
temp tables, etc).
As a rule of thumb: Use autonumbering PK on every table.

In my above example I wouldn't need to be afraid you have some values for
'order' that are the same (which would make that approach fail).
A autonumbering PK would have eliminated that problem.
Understood. I will add a autonumbering PK
>
>>>OTOH, in mysql it appears that if you do a select for the whole
table you have them all in memory and can change the value of the
field "order" without having to worry about getting that record
again and then save the whole table.

Is that correct ?
No.
bummer

Unless you mean my above script. That takes the all 'order' values from
tblorder in memory, but that is independent from the database, it is PHP's
memory.
Works for me.
Thanks
>
>>A query is a query: something that return results (or none).
If you SELECT something, nothing is changed.

I don't understand what you mean by 'selecting the whole table in
memory'. If you need to UPDATE certain records, use the UPDATE command to
do so.
but after having UPDATEd, one still has the problem of reading
the next record

Not if you read them all at once first into memory of php and start updating
then.
>>>bill
Regards,
Erwin Moller
your assistance is appreciated

You're welcome
>bill

Regards,
Erwin Moller
I certainly appreciate the tutorial. I understand the internals
of mySQL just a little bit better.

bill
Mar 30 '07 #8

P: n/a
Toby A Inkster wrote:
bill wrote:
>The problem may come in using a renumber function which should
take the steps in their current order and reassign the "order"
key, assigning each set an "order" that is the prior "order" + 10.

Something like:

<?php
$db = new PDO(/* connection settings */);
$db->query("ALTER TABLE foobar ADD tempcol integer;");
$update = $db->prepare("UPDATE foobar SET tempcol=? WHERE order=?;");
$count = 0;
foreach ($db->query("SELECT order FROM foobar ORDER BY order;") as $row)
{
$count+=10;
$update->execute($count, $row['order']);
}
$db->query("UPDATE foobar SET order=tempcol;");
$db->query("ALTER TABLE foobar DROP tempcol;");
unset($count, $update);
?>
Lovely idea. For my small number a PHP array probably works
better, but I do like this approach and will archive it.

One additional question: does the foreach loop terminate because
the query return false ?

bill
Mar 30 '07 #9

P: n/a
bill wrote:
One additional question: does the foreach loop terminate because
the query return false ?
It terminates as soon as the query has run out of rows.

It's a PHP 5 feature called "Iterators" -- that is PDO queries are
returned as objects that, although they are not arrays, can be treated as
arrays in a foreach loop. The solution would work equally well using PHP 4
constructs, but the code doesn't look as neat.

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact
Geek of ~ HTML/SQL/Perl/PHP/Python*/Apache/Linux

* = I'm getting there!
Mar 30 '07 #10

P: n/a
bill wrote:
I will add a autonumbering PK
Auto-numbered synthetic primary keys are the root of all evil. They usually
indicate a lazy approach to database design. There is almost always a
natural column or combination of columns that can be used as a primary key
without the need to add an extra, redundant numerical column which doesn't
contain any useful information.

Assuming that the you never plan on having two events that occur
concurrently, your "order" column is a perfect natural candidate key.
(Though it's a little poorly named, given that ORDER is a SQL keyword.)

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact
Geek of ~ HTML/SQL/Perl/PHP/Python*/Apache/Linux

* = I'm getting there!
Mar 30 '07 #11

P: n/a
Toby A Inkster wrote:
bill wrote:
>I will add a autonumbering PK

Auto-numbered synthetic primary keys are the root of all evil. They
usually indicate a lazy approach to database design. There is almost
always a natural column or combination of columns that can be used as a
primary key without the need to add an extra, redundant numerical column
which doesn't contain any useful information.

Assuming that the you never plan on having two events that occur
concurrently, your "order" column is a perfect natural candidate key.
(Though it's a little poorly named, given that ORDER is a SQL keyword.)
Hi Toby,

Despite my high respect for you: I completely disagree.
Why picking 'natural candidates' if you can make it work ALWAYS with a
simple autonumbering PK?
What if your database must be upgraded and the logic changes?
Do you want to check all the columns again to be sure the PK still makes
sense? (Or watch it fail in a production environment when the UNIQUE
constraint is hit you didn't see coming beforehand)

I have been using autonumbering PK my whole programming carrier, and never
had any problems with it.

Who seriously cares about the few extra bytes needed?

I am not alone with that thought.
Postgres even makes an OID for each row, something you don't even see but
can use if you want.

It is perfectly natural IMO when designing a database to point to each row
in a simple, coherent, and easy to understand fashion by using
autonumbering PKs.
It also makes it very easy to use FK.

I don't get it why you think of that as 'the root of all evil'.
What evil comes out of it?

Regards,
Erwin Moller

Mar 30 '07 #12

P: n/a
Erwin Moller wrote:
Why picking 'natural candidates' if you can make it work ALWAYS with a
simple autonumbering PK?
As someone who normally spends quite some time refining my database schema
before creating the database, by the time I've come to pick a primary key,
I've normally already decided on one or two UNIQUE constraints, so it's a
simple matter of deciding which of the UNIQUE constraints is fit to be the
primary key.

Besides, there are other techniques that can be said to "always work". For
example, the creation of a varchar column "code" as a primary key. It's
still a surrogate key really, but it can take on more meaning than an
auto-numbered surrogate.

For example, in one of my current projects, I need to store several
articles, each of which must be issued under a particular licence (e.g.
GPL, FDL, Creatice Commons). The "auto-number everything" solution would
be:

======================================
table: articles
--------------------------------------
article_id integer, autonumbered
title varchar
body varchar
licence integer
======================================

======================================
table: licences
--------------------------------------
licence_id integer
licence_name varchar
licence_link varchar
======================================

Example data in table licences:

1 GNU General Public Licence http://www.gnu...
2 GNU Free Documentation Licence http://www.gnu...
3 Creative Commons Licence http://www.cre...

Using a manually-named varchar surrogate primary key, you could have

======================================
table: articles
--------------------------------------
article_id integer, autonumbered
title varchar
body varchar
licence char(8)
======================================

======================================
table: licences
--------------------------------------
licence_code char(8)
licence_name varchar
licence_link varchar
======================================

Example data in table licences:

GPL GNU General Public Licence http://www.gnu...
FDL GNU Free Documentation Licence http://www.gnu...
CC Creative Commons Licence http://www.cre...

Usage is fairly similar, apart from the fact that now, when you look at
the table 'articles' without doing any joins, you can still infer a bit of
information about which licence each article is under, without having to
inner join onto the licences table.

This isn't *always* a good approach, but it's often a lot better than an
auto-numbered key.

And before you say that this is a waste of space as char(8) takes up eight
bytes rather than 4 bytes for an integer, you're second-guessing the
database engine there. Database engines really are dead clever. Most will
only store the full char(8) string in the licences table (likely to be
quite small compared to the articles table), and when storing the licence
column of the articles table will actually use a pointer back to the same
string data from the licence table -- very fast. Database engines really
are dead clever. (And as far as sorting is concerned, a short, indexed
char field is just as fast as an integer.)
What if your database must be upgraded and the logic changes?
Do you want to check all the columns again to be sure the PK still makes
sense? (Or watch it fail in a production environment when the UNIQUE
constraint is hit you didn't see coming beforehand)
When the logic changes in some major way, you're probably going to need to
make adjustments to several tables anyway. I don't see this as a major
problem.

Besides which, it's often quite easy to choose a column that will always
be unique. For example for a table of users, instead of:

user_id auto_increment (primary key)
login varchar
password varchar
realname varchar
email_address varchar

All you need is:

login varchar (primary key)
password varchar
realname varchar
email_address varchar

Whatsmore, say you then have another table which has a column that has a
foreign key for your user table, looking down that column you don't see a
bunch of numbers like "12, 14, 71, 14" -- you see "brian, dave, greg, dave".

Logic changing is never going to be a problem -- you're never going to
have two users with the same login name.
I have been using autonumbering PK my whole programming carrier, and never
had any problems with it.
No doubt -- the problem occurs when people blindly use surrogate keys
without thinking about whether the rest of their columns could be keys --
which I'm sure you'd never do Erwin! :-)

By doing that, they put less thought into UNIQUE constraints, and end up
with lots of unwanted rows which would be duplicates, except for their
primary key. If they hadn't added that extra primary key number, then they
wouldn't have a database full of duplicate values.
Who seriously cares about the few extra bytes needed?
Not me.
I am not alone with that thought.
Postgres even makes an OID for each row, something you don't even see but
can use if you want.
It's perfectly easy to disable OIDs in PostgreSQL on a case-by-case basis
or permanently. Most of the time, I disable OIDs.

On one of my current projects, I've got twelve tables, only one of which
has a surrogate integer primary key (technically it doesn't autonumber,
but I use MAX() to simulate autonumbering when creating a new record).
Guess which table is causing me the most problems?

It's a table of articles. A table of comments references the articles via
its numeric key. Now what happens if I write an updated version of an
article, but want to keep a copy of the old one? I mark a status flag on
the old one to hide it, then create another article with the same URL but
a different ID number. Unfortunately the comments still point to the old
ID number, so are not seen when you visit the new page.

Better would have been to design my table so that it had a primary key
like (url, revision).

Anyhow, it's mostly a matter of taste. I was being somewhat tongue in
cheek when describing surrogate keys as "the root of all evil", but I
can't stand to see a good candidate key go to waste.

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact
Geek of ~ HTML/SQL/Perl/PHP/Python*/Apache/Linux

* = I'm getting there!
Mar 30 '07 #13

P: n/a
Toby A Inkster wrote:
Erwin Moller wrote:
>Why picking 'natural candidates' if you can make it work ALWAYS with a
simple autonumbering PK?
Hi Toby,
As someone who normally spends quite some time refining my database schema
before creating the database, by the time I've come to pick a primary key,
I've normally already decided on one or two UNIQUE constraints, so it's a
simple matter of deciding which of the UNIQUE constraints is fit to be the
primary key.

Besides, there are other techniques that can be said to "always work". For
example, the creation of a varchar column "code" as a primary key. It's
still a surrogate key really, but it can take on more meaning than an
auto-numbered surrogate.

For example, in one of my current projects, I need to store several
articles, each of which must be issued under a particular licence (e.g.
GPL, FDL, Creatice Commons). The "auto-number everything" solution would
be:

======================================
table: articles
--------------------------------------
article_id integer, autonumbered
title varchar
body varchar
licence integer
======================================

======================================
table: licences
--------------------------------------
licence_id integer
licence_name varchar
licence_link varchar
======================================

Example data in table licences:

1 GNU General Public Licence http://www.gnu...
2 GNU Free Documentation Licence http://www.gnu...
3 Creative Commons Licence http://www.cre...

Using a manually-named varchar surrogate primary key, you could have

======================================
table: articles
--------------------------------------
article_id integer, autonumbered
title varchar
body varchar
licence char(8)
======================================

======================================
table: licences
--------------------------------------
licence_code char(8)
licence_name varchar
licence_link varchar
======================================

Example data in table licences:

GPL GNU General Public Licence http://www.gnu...
FDL GNU Free Documentation Licence http://www.gnu...
CC Creative Commons Licence http://www.cre...

Usage is fairly similar, apart from the fact that now, when you look at
the table 'articles' without doing any joins, you can still infer a bit of
information about which licence each article is under, without having to
inner join onto the licences table.

This isn't *always* a good approach, but it's often a lot better than an
auto-numbered key.
OK, clear example.
This has the advantage you have more descriptive PKs.

>
And before you say that this is a waste of space as char(8) takes up eight
bytes rather than 4 bytes for an integer, you're second-guessing the
database engine there.
I won't say that. :-)
I never care too much about a byte or two extra if it increases readability
or structure, etc.
Database engines really are dead clever. Most will
only store the full char(8) string in the licences table (likely to be
quite small compared to the articles table), and when storing the licence
column of the articles table will actually use a pointer back to the same
string data from the licence table -- very fast. Database engines really
are dead clever. (And as far as sorting is concerned, a short, indexed
char field is just as fast as an integer.)
Yes, B-tree index lookups (and the like) are extremely fast.
>
>What if your database must be upgraded and the logic changes?
Do you want to check all the columns again to be sure the PK still makes
sense? (Or watch it fail in a production environment when the UNIQUE
constraint is hit you didn't see coming beforehand)

When the logic changes in some major way, you're probably going to need to
make adjustments to several tables anyway. I don't see this as a major
problem.

Besides which, it's often quite easy to choose a column that will always
be unique. For example for a table of users, instead of:

user_id auto_increment (primary key)
login varchar
password varchar
realname varchar
email_address varchar

All you need is:

login varchar (primary key)
password varchar
realname varchar
email_address varchar

Whatsmore, say you then have another table which has a column that has a
foreign key for your user table, looking down that column you don't see a
bunch of numbers like "12, 14, 71, 14" -- you see "brian, dave, greg,
dave".
Well, a simple join solves that.
>
Logic changing is never going to be a problem -- you're never going to
have two users with the same login name.
Well, you gave me excactly an example where your approach is weaker than
using autonumbering PKs, I think.

Consider this example:
1) Your tbluser as described above by you.
2) a collection of articles written by that user

CREATE TABLE tblarticle(
articleid serial PRIMARY KEY,
writtenby login REFERENCES tbluser(login),
content VARCHAR(10000)
)

I made a FK as you see.
Now what happens if that user wants a new loginname?

You are forced to update this new loginname everwhere in your database where
you use it.
Or do you use cascading?
(I dislike cascading though all FK constraints, but it is of course
possible)
>
>I have been using autonumbering PK my whole programming carrier, and
never had any problems with it.

No doubt -- the problem occurs when people blindly use surrogate keys
without thinking about whether the rest of their columns could be keys --
which I'm sure you'd never do Erwin! :-)
;-)

>
By doing that, they put less thought into UNIQUE constraints, and end up
with lots of unwanted rows which would be duplicates, except for their
primary key. If they hadn't added that extra primary key number, then they
wouldn't have a database full of duplicate values.
Well, that is normalization.
The more databases you design, the better you get at grouping stuff and
making entities.
I never store a piece of information double.
Almost never, unless the joining will take that much calculation that I
think it is better to denormalize a little.

>
>Who seriously cares about the few extra bytes needed?

Not me.
>I am not alone with that thought.
Postgres even makes an OID for each row, something you don't even see but
can use if you want.

It's perfectly easy to disable OIDs in PostgreSQL on a case-by-case basis
or permanently. Most of the time, I disable OIDs.

On one of my current projects, I've got twelve tables, only one of which
has a surrogate integer primary key (technically it doesn't autonumber,
but I use MAX() to simulate autonumbering when creating a new record).
What happened to good old nextval()?
Guess which table is causing me the most problems?
The other ones? ;-)
>
It's a table of articles. A table of comments references the articles via
its numeric key. Now what happens if I write an updated version of an
article, but want to keep a copy of the old one? I mark a status flag on
the old one to hide it, then create another article with the same URL but
a different ID number. Unfortunately the comments still point to the old
ID number, so are not seen when you visit the new page.

Better would have been to design my table so that it had a primary key
like (url, revision).
Yes, that would be a nice solution.

I would typically do something like this:

CREATE TABLE revisions(
revisionid SERIAL PRIMARY KEY,
content VARCHAR(10000),
createdate timestamp
)

CREATE TABLE tblarticle(
articleid SERIAL PRIMARY KEY,
url varchar(1000),
activerevisionid integer REFERENCES tblrevisions(revisionid)
)

CREATE TABLE comments(
commentid SERIAL PRIMARY KEY,
comment VARCHAR(1000),
articleid integer REFERENCES tblarticle(articleid)
)

So you have your comments pointing at the article, and not the revision.
But, as you say, this is a matter of taste.
>
Anyhow, it's mostly a matter of taste. I was being somewhat tongue in
cheek when describing surrogate keys as "the root of all evil", but I
can't stand to see a good candidate key go to waste.
:-)
Clear.

I liked this discussion.
Once I tauch myself to work with autonumbering PK and good normalization and
FKs, I never gave the Primary Key much second thought.
Always good to hear other opinions.

Best of luck!

Regards,
Erwin Moller
Mar 30 '07 #14

P: n/a
Toby A Inkster wrote:
Erwin Moller wrote:
>Why picking 'natural candidates' if you can make it work ALWAYS with a
simple autonumbering PK?

As someone who normally spends quite some time refining my database schema
before creating the database, by the time I've come to pick a primary key,
I've normally already decided on one or two UNIQUE constraints, so it's a
simple matter of deciding which of the UNIQUE constraints is fit to be the
primary key.

Besides, there are other techniques that can be said to "always work". For
example, the creation of a varchar column "code" as a primary key. It's
still a surrogate key really, but it can take on more meaning than an
auto-numbered surrogate.

Hi, Toby, and I have to agree with Erwin. As much as I respect you, I
have to disagree. Please see below...

For example, in one of my current projects, I need to store several
articles, each of which must be issued under a particular licence (e.g.
GPL, FDL, Creatice Commons). The "auto-number everything" solution would
be:

======================================
table: articles
--------------------------------------
article_id integer, autonumbered
title varchar
body varchar
licence integer
======================================

======================================
table: licences
--------------------------------------
licence_id integer
licence_name varchar
licence_link varchar
======================================

Example data in table licences:

1 GNU General Public Licence http://www.gnu...
2 GNU Free Documentation Licence http://www.gnu...
3 Creative Commons Licence http://www.cre...

Using a manually-named varchar surrogate primary key, you could have

======================================
table: articles
--------------------------------------
article_id integer, autonumbered
title varchar
body varchar
licence char(8)
======================================

======================================
table: licences
--------------------------------------
licence_code char(8)
licence_name varchar
licence_link varchar
======================================

Example data in table licences:

GPL GNU General Public Licence http://www.gnu...
FDL GNU Free Documentation Licence http://www.gnu...
CC Creative Commons Licence http://www.cre...

Usage is fairly similar, apart from the fact that now, when you look at
the table 'articles' without doing any joins, you can still infer a bit of
information about which licence each article is under, without having to
inner join onto the licences table.
True, but you also have to look at performance issues. When searching
an index, comparing an int is always faster than comparing a varchar.
And comparing a single column is always faster than comparing multiple
columns. And the index file itself is smaller.
This isn't *always* a good approach, but it's often a lot better than an
auto-numbered key.
Additionally, the PK should not be dependent on data which may change -
i.e. if part of your key was the license (code), what would happen if
they changed the licensing terms?
And before you say that this is a waste of space as char(8) takes up eight
bytes rather than 4 bytes for an integer, you're second-guessing the
database engine there. Database engines really are dead clever. Most will
only store the full char(8) string in the licences table (likely to be
quite small compared to the articles table), and when storing the licence
column of the articles table will actually use a pointer back to the same
string data from the licence table -- very fast. Database engines really
are dead clever. (And as far as sorting is concerned, a short, indexed
char field is just as fast as an integer.)
Not generally, they don't, because of potential problems. For instance,
if the data is stored as a pointer to the license table, the system
has to do an additional file lookup to fetch the license data. And what
happens if the entry from the license table is altered - or worse yet,
deleted? There is no referential integrity built in here. Deleting an
item from the license table would require all other tables which point
to that entry be updated - that is, the varchar data would have to be
reinserted into each row in every table which pointed to the license table.

And in your case, the data would be stored in 4 bytes ("GPL" + 1 byte
length). So it would take the same 4 bytes - but comparisons would
still be slower.
>What if your database must be upgraded and the logic changes?
Do you want to check all the columns again to be sure the PK still makes
sense? (Or watch it fail in a production environment when the UNIQUE
constraint is hit you didn't see coming beforehand)

When the logic changes in some major way, you're probably going to need to
make adjustments to several tables anyway. I don't see this as a major
problem.
Even when logic changes I don't generally have to change table design,
other than to perhaps add a column or two. Good normalization
techniques help here.
Besides which, it's often quite easy to choose a column that will always
be unique. For example for a table of users, instead of:

user_id auto_increment (primary key)
login varchar
password varchar
realname varchar
email_address varchar

All you need is:

login varchar (primary key)
password varchar
realname varchar
email_address varchar

Whatsmore, say you then have another table which has a column that has a
foreign key for your user table, looking down that column you don't see a
bunch of numbers like "12, 14, 71, 14" -- you see "brian, dave, greg, dave".

Logic changing is never going to be a problem -- you're never going to
have two users with the same login name.
That's true. But you're also taking up more storage space and slowing
down searches.
>I have been using autonumbering PK my whole programming carrier, and never
had any problems with it.

No doubt -- the problem occurs when people blindly use surrogate keys
without thinking about whether the rest of their columns could be keys --
which I'm sure you'd never do Erwin! :-)

By doing that, they put less thought into UNIQUE constraints, and end up
with lots of unwanted rows which would be duplicates, except for their
primary key. If they hadn't added that extra primary key number, then they
wouldn't have a database full of duplicate values.
You still can (and should) put serious thought into unique constraints.
Just having a separate PK shouldn't change that.
>Who seriously cares about the few extra bytes needed?

Not me.
I do. When it comes to very large databases, this can add up very
quickly. And it can slow the system down significantly.
>I am not alone with that thought.
Postgres even makes an OID for each row, something you don't even see but
can use if you want.

It's perfectly easy to disable OIDs in PostgreSQL on a case-by-case basis
or permanently. Most of the time, I disable OIDs.

On one of my current projects, I've got twelve tables, only one of which
has a surrogate integer primary key (technically it doesn't autonumber,
but I use MAX() to simulate autonumbering when creating a new record).
Guess which table is causing me the most problems?
I can imagine. Using MAX() like this can cause concurrency problems.
That's why RDB designers came out with the auto-numbering columns.
It's a table of articles. A table of comments references the articles via
its numeric key. Now what happens if I write an updated version of an
article, but want to keep a copy of the old one? I mark a status flag on
the old one to hide it, then create another article with the same URL but
a different ID number. Unfortunately the comments still point to the old
ID number, so are not seen when you visit the new page.
It's not that hard to update the comment table when updating the article
table.
Better would have been to design my table so that it had a primary key
like (url, revision).
But then your comment table wouldn't point to it anyway if you use (url,
revision). Or, if you just use url, your comment table would be
pointing at two different entries - which is not good foreign key design.
Anyhow, it's mostly a matter of taste. I was being somewhat tongue in
cheek when describing surrogate keys as "the root of all evil", but I
can't stand to see a good candidate key go to waste.
Here, I agree. But I tend to lean more towards the performance side.
Design the database with efficiency in mind. That's why all databases
aren't 5NF form.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================
Mar 30 '07 #15

P: n/a
>I will add a autonumbering PK
>
Auto-numbered synthetic primary keys are the root of all evil. They usually
indicate a lazy approach to database design. There is almost always a
natural column or combination of columns that can be used as a primary key
without the need to add an extra, redundant numerical column which doesn't
contain any useful information.

Assuming that the you never plan on having two events that occur
concurrently, your "order" column is a perfect natural candidate key.
I disagree, if the nature of that key is such that insertion or deletion
of a record requires re-numbering of other records. I believe the OP wanted
something so that the next upcoming event was *always* numbered 10.
If it's acceptable at some point that the events in the table are numbered
85123, 85127, and 85136, that's fine.
>(Though it's a little poorly named, given that ORDER is a SQL keyword.)
Mar 30 '07 #16

P: n/a
Jerry Stuckle wrote:
Toby A Inkster wrote:
>Erwin Moller wrote:

Example data in table licences:

GPL GNU General Public Licence http://www.gnu...
FDL GNU Free Documentation Licence http://www.gnu...
CC Creative Commons Licence http://www.cre...

Usage is fairly similar, apart from the fact that now, when you look at
the table 'articles' without doing any joins, you can still infer a bit of
information about which licence each article is under, without having to
inner join onto the licences table.

True, but you also have to look at performance issues. When searching
an index, comparing an int is always faster than comparing a varchar.
And comparing a single column is always faster than comparing multiple
columns. And the index file itself is smaller.
Firstly, I said char, not varchar. char columns are fixed width storage
and consequently a lot faster to search than varchar columns. On a 32-bit
processor, char(4) is particularly fast as it corresponds to the
processor's word length.
Additionally, the PK should not be dependent on data which may change -
i.e. if part of your key was the license (code), what would happen if
they changed the licensing terms?
Firstly, in this example, the licence code is an arbitrary string chosen by
the creator of the data, so it needs to change no more often than an
integer key would.

Secondly, it is perfectly acceptable for a primary key to change. A decent
database can even cascade changes to dependent tables for you.
>And before you say that this is a waste of space as char(8) takes up
eight bytes rather than 4 bytes for an integer, you're second-guessing
the database engine there. Database engines really are dead clever.

Not generally, they don't, because of potential problems. For instance,
if the data is stored as a pointer to the license table, the system
has to do an additional file lookup to fetch the license data. And what
happens if the entry from the license table is altered - or worse yet,
deleted? There is no referential integrity built in here. Deleting an
item from the license table would require all other tables which point
to that entry be updated - that is, the varchar data would have to be
reinserted into each row in every table which pointed to the license
table.
Of course there's referential integrity built-in, via foreign key
constraints. If foreign keys are correctly set up, then depending on
how you configured it, one of the following actions would happen when
you tried to delete a licence from the licences table which was being
used in the articles table:

1. the RDBMS would not allow you to delete it;
2. the articles which used that licence would also be deleted; or
3. the articles which used that licence would have their licence
set to the default licence.

It's all about cascading.
And in your case, the data would be stored in 4 bytes ("GPL" + 1 byte
length). So it would take the same 4 bytes - but comparisons would
still be slower.
Greater than / less than comparisons would be slower. But (case sensitive)
equal to / not equal to comparisons should be of equivalent speed, as it's
a simple matter of choosing if one 32-bit value is the same as another
32-bit value -- exactly the same process used if they were integers.
Probably the same CPU instructions used.
That's true. But you're also taking up more storage space and slowing
down searches.
More storage space on, say, the articles table, for storing the user's
login in a column instead of an integer; but less space in the user table,
as there's no need to store an extra integer column.

Slower to perform a join between the article and user tables, but fewer
occasions when you'll need to perform a join, as you have some useful user
information (the login) in the article table already.

>On one of my current projects, I've got twelve tables, only one of
which has a surrogate integer primary key (technically it doesn't
autonumber, but I use MAX() to simulate autonumbering when creating a
new record). Guess which table is causing me the most problems?

I can imagine. Using MAX() like this can cause concurrency problems.
That's why RDB designers came out with the auto-numbering columns.
Not heard of transactions?
>Better would have been to design my table so that it had a primary key
like (url, revision).

But then your comment table wouldn't point to it anyway if you use (url,
revision). Or, if you just use url, your comment table would be
pointing at two different entries - which is not good foreign key
design.
It would be pointing to a partial key. Which I think is against 4NF or
5NF, but I mostly ignore those two as I've found them of very little
practical use.
>Anyhow, it's mostly a matter of taste. I was being somewhat tongue in
cheek when describing surrogate keys as "the root of all evil", but I
can't stand to see a good candidate key go to waste.

Here, I agree. But I tend to lean more towards the performance side.
Design the database with efficiency in mind. That's why all databases
aren't 5NF form.
I tend to err on the side of my own sanity. As far as performance is
concerned, I'm of the opinion that it's premature optimisation, and that
the RDBMS programmers will have already done a much better job than I
would have done.

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact
Geek of ~ HTML/SQL/Perl/PHP/Python*/Apache/Linux

* = I'm getting there!
Mar 31 '07 #17

P: n/a
Toby A Inkster wrote:
Jerry Stuckle wrote:
>Toby A Inkster wrote:
>>Erwin Moller wrote:

Example data in table licences:

GPL GNU General Public Licence http://www.gnu...
FDL GNU Free Documentation Licence http://www.gnu...
CC Creative Commons Licence http://www.cre...

Usage is fairly similar, apart from the fact that now, when you look at
the table 'articles' without doing any joins, you can still infer a bit of
information about which licence each article is under, without having to
inner join onto the licences table.
True, but you also have to look at performance issues. When searching
an index, comparing an int is always faster than comparing a varchar.
And comparing a single column is always faster than comparing multiple
columns. And the index file itself is smaller.

Firstly, I said char, not varchar. char columns are fixed width storage
and consequently a lot faster to search than varchar columns. On a 32-bit
processor, char(4) is particularly fast as it corresponds to the
processor's word length.
True, but unless it's a binary column the search must be case-insensitive.

Also, even if you specify char(4) the field may not end up that way.
For instance, in MySQL if you have a mixture of both char and varchar
columns, the result is generally all char columns are converted to varchar.

Finally, even if none of the above are true, an int column will almost
always be word aligned in memory. A char field - even if it's char(4),
is not always word aligned.
>Additionally, the PK should not be dependent on data which may change -
i.e. if part of your key was the license (code), what would happen if
they changed the licensing terms?

Firstly, in this example, the licence code is an arbitrary string chosen by
the creator of the data, so it needs to change no more often than an
integer key would.
So why not just use an auto-number field. It seems you're unnecessarily
creating an unnecessary bottleneck.
Secondly, it is perfectly acceptable for a primary key to change. A decent
database can even cascade changes to dependent tables for you.
Yes, it's acceptable. But it's not recommended, and good database
design would argue against it.
>>And before you say that this is a waste of space as char(8) takes up
eight bytes rather than 4 bytes for an integer, you're second-guessing
the database engine there. Database engines really are dead clever.
Not generally, they don't, because of potential problems. For instance,
if the data is stored as a pointer to the license table, the system
has to do an additional file lookup to fetch the license data. And what
happens if the entry from the license table is altered - or worse yet,
deleted? There is no referential integrity built in here. Deleting an
item from the license table would require all other tables which point
to that entry be updated - that is, the varchar data would have to be
reinserted into each row in every table which pointed to the license
table.

Of course there's referential integrity built-in, via foreign key
constraints. If foreign keys are correctly set up, then depending on
how you configured it, one of the following actions would happen when
you tried to delete a licence from the licences table which was being
used in the articles table:

1. the RDBMS would not allow you to delete it;
2. the articles which used that licence would also be deleted; or
3. the articles which used that licence would have their licence
set to the default licence.
Sure. But you don't have to use referential integrity. And no,
databases do not just point to another table, as I indicated. Look at
the data in your tables. You will find the char fields.
It's all about cascading.
>And in your case, the data would be stored in 4 bytes ("GPL" + 1 byte
length). So it would take the same 4 bytes - but comparisons would
still be slower.

Greater than / less than comparisons would be slower. But (case sensitive)
equal to / not equal to comparisons should be of equivalent speed, as it's
a simple matter of choosing if one 32-bit value is the same as another
32-bit value -- exactly the same process used if they were integers.
Probably the same CPU instructions used.
But char values are not necessarily word aligned, and they aren't
necessarily handled as a 32 bit int.

In fact, even if they were - the database would have to make the
conscious decision that it is valid, and load the fields on a word
boundary. Both take additional time over a simple int field.
>That's true. But you're also taking up more storage space and slowing
down searches.

More storage space on, say, the articles table, for storing the user's
login in a column instead of an integer; but less space in the user table,
as there's no need to store an extra integer column.
And the result is more space required. And every additional table
requires that much extra space.
Slower to perform a join between the article and user tables, but fewer
occasions when you'll need to perform a join, as you have some useful user
information (the login) in the article table already.

Even searching for the row identified by the PK will be slower.
>>On one of my current projects, I've got twelve tables, only one of
which has a surrogate integer primary key (technically it doesn't
autonumber, but I use MAX() to simulate autonumbering when creating a
new record). Guess which table is causing me the most problems?
I can imagine. Using MAX() like this can cause concurrency problems.
That's why RDB designers came out with the auto-numbering columns.

Not heard of transactions?
Sure. And in this case transactions cause their own problems. You need
at least 4 calls to the database:

START TRANSACTION
SELECT MAX(colId)+1 FROM mytable
INSERT INTO mytable ...
COMMIT

Significantly slower than a single call which returns an auto-numbered
column. And worse yet, it will tie up the table for additional changes
until the COMMIT. Very bad for concurrency.
>>Better would have been to design my table so that it had a primary key
like (url, revision).
But then your comment table wouldn't point to it anyway if you use (url,
revision). Or, if you just use url, your comment table would be
pointing at two different entries - which is not good foreign key
design.

It would be pointing to a partial key. Which I think is against 4NF or
5NF, but I mostly ignore those two as I've found them of very little
practical use.
It's against foreign key constraints, also. The referenced column(s)
must be either the primary key or have a unique index. Non-unique
entries are not allowed.
>>Anyhow, it's mostly a matter of taste. I was being somewhat tongue in
cheek when describing surrogate keys as "the root of all evil", but I
can't stand to see a good candidate key go to waste.
Here, I agree. But I tend to lean more towards the performance side.
Design the database with efficiency in mind. That's why all databases
aren't 5NF form.

I tend to err on the side of my own sanity. As far as performance is
concerned, I'm of the opinion that it's premature optimisation, and that
the RDBMS programmers will have already done a much better job than I
would have done.
It depends on the database and your needs. For something running 10K
hits a on a database with 50K rows, this is fine.

But I've worked on some databases with hundreds of millions of rows (not
all one table, of course) with size well into the terabyte region.
Number of requests range upwards of 10's of thousands per second.

Performance here is important. And I carry these same techniques down
to smaller databases. I don't consider it 'premature optimization' to
use good rdb design techniques - including 3NF.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================
Mar 31 '07 #18

P: n/a
Erwin Moller wrote:
Consider this example:
1) Your tbluser as described above by you.
2) a collection of articles written by that user

CREATE TABLE tblarticle(
articleid serial PRIMARY KEY,
writtenby login REFERENCES tbluser(login),
content VARCHAR(10000)
)

I made a FK as you see.
Now what happens if that user wants a new loginname?

You are forced to update this new loginname everwhere in your database where
you use it.
Or do you use cascading?
(I dislike cascading though all FK constraints, but it is of course
possible)
I do indeed use cascading. But I'd be tempted not to allow users to change
login anyway, as it would probably result in URL changes, which I try to
avoid as much as possible.
What happened to good old nextval()?
This particular project aims at supporting PostgreSQL, MySQL, Firebird, MS
SQL Server and Oracle. PostgreSQL sequences are nice, but
PostgreSQL-specific.

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact
Geek of ~ HTML/SQL/Perl/PHP/Python*/Apache/Linux

* = I'm getting there!
Mar 31 '07 #19

P: n/a
>Secondly, it is perfectly acceptable for a primary key to change. A decent
>database can even cascade changes to dependent tables for you.
I'm curious as to how cascade can change things outside of the database
in the real world. For example, how does it change what's on an
invoice mailed to a customer yesterday (or perhaps last year)?
Apr 1 '07 #20

P: n/a
Jerry Stuckle wrote:
Toby A Inkster wrote:
>Firstly, I said char, not varchar. char columns are fixed width storage
and consequently a lot faster to search than varchar columns. On a
32-bit processor, char(4) is particularly fast as it corresponds to the
processor's word length.

True, but unless it's a binary column the search must be
case-insensitive.
I normally use PostgreSQL, in which all strings are handles
case-sensitively unless you specify otherwise.
Finally, even if none of the above are true, an int column will almost
always be word aligned in memory. A char field - even if it's char(4),
is not always word aligned.
Perhaps not word-aligned when stored as row data, but certainly
row-aligned in the index. And you'd nearly always want to index a primary
key column.
>Firstly, in this example, the licence code is an arbitrary string
chosen by the creator of the data, so it needs to change no more often
than an integer key would.

So why not just use an auto-number field. It seems you're unnecessarily
creating an unnecessary bottleneck.
If you use a piece of useful data as a primary key, instead of a
meaningless number, this in many cases reduces the need to perform
joins between tables, resulting in simplifying queries and speeding things
up. As an example, consider extracting a list of article titles plus the
author's login name from our articles and users tables:

My schema:

CREATE TABLE articles (
article_id integer PRIMARY KEY,
title varchar NOT NULL,
body varchar NOT NULL,
author char(16) REFERENCES users
);
CREATE TABLE users (
login char(16) PRIMARY KEY,
email varchar NOT NULL
);
-- Insert lots of data here.
SELECT title, author FROM articles;

Yours:

CREATE TABLE articles (
article_id integer PRIMARY KEY,
title varchar NOT NULL,
body varchar NOT NULL,
author integer REFERENCES users
);
CREATE TABLE users (
user_id integer PRIMARY KEY,
login char(16) NOT NULL,
email varchar NOT NULL,
UNIQUE (login)
);
-- Insert lots of data here.
SELECT a.title, u.login AS author
FROM articles a
INNER JOIN users u ON a.author=u.user_id;

The second schema has to query two tables and perform an inner join,
which is slower than performing a single query to a single table.
Sure. But you don't have to use referential integrity. And no,
databases do not just point to another table, as I indicated. Look at
the data in your tables. You will find the char fields.
You will find the char fields if you use SQL. I'm talking about the binary
data that the RDBMS actually keeps in memory while it's running. They use
all kinds of tricks for speed boosts, including using pointers for primary
keys instead of using the real data.
Sure. And in this case transactions cause their own problems. You need
at least 4 calls to the database:

START TRANSACTION
SELECT MAX(colId)+1 FROM mytable
INSERT INTO mytable ...
COMMIT

Significantly slower than a single call which returns an auto-numbered
column. And worse yet, it will tie up the table for additional changes
until the COMMIT. Very bad for concurrency.
I'm sure autonumbering results in something fairly similar happening
deep down in the RDBMS anyway.

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact
Geek of ~ HTML/SQL/Perl/PHP/Python*/Apache/Linux

* = I'm getting there!
Apr 1 '07 #21

P: n/a
Toby A Inkster wrote:
Jerry Stuckle wrote:
>Toby A Inkster wrote:
>>Firstly, I said char, not varchar. char columns are fixed width storage
and consequently a lot faster to search than varchar columns. On a
32-bit processor, char(4) is particularly fast as it corresponds to the
processor's word length.
True, but unless it's a binary column the search must be
case-insensitive.

I normally use PostgreSQL, in which all strings are handles
case-sensitively unless you specify otherwise.
Not all databases do this. And that still doesn't mean a char(4) will
be compared with the same code as an int.
>Finally, even if none of the above are true, an int column will almost
always be word aligned in memory. A char field - even if it's char(4),
is not always word aligned.

Perhaps not word-aligned when stored as row data, but certainly
row-aligned in the index. And you'd nearly always want to index a primary
key column.
Are you sure? I would highly doubt a char(4) would be word aligned in
any index. What a waste of space, i.e. char(4) and char(1) as your
index. You should have 60% more space taken up in alignment than you
have in data.

>>Firstly, in this example, the licence code is an arbitrary string
chosen by the creator of the data, so it needs to change no more often
than an integer key would.
So why not just use an auto-number field. It seems you're unnecessarily
creating an unnecessary bottleneck.

If you use a piece of useful data as a primary key, instead of a
meaningless number, this in many cases reduces the need to perform
joins between tables, resulting in simplifying queries and speeding things
up. As an example, consider extracting a list of article titles plus the
author's login name from our articles and users tables:
If the "useful data" means you are reducing the need to join tables, you
have a vary poorly normalized database.
My schema:

CREATE TABLE articles (
article_id integer PRIMARY KEY,
title varchar NOT NULL,
body varchar NOT NULL,
author char(16) REFERENCES users
);
CREATE TABLE users (
login char(16) PRIMARY KEY,
email varchar NOT NULL
);
-- Insert lots of data here.
SELECT title, author FROM articles;

Yours:

CREATE TABLE articles (
article_id integer PRIMARY KEY,
title varchar NOT NULL,
body varchar NOT NULL,
author integer REFERENCES users
);
CREATE TABLE users (
user_id integer PRIMARY KEY,
login char(16) NOT NULL,
email varchar NOT NULL,
UNIQUE (login)
);
-- Insert lots of data here.
SELECT a.title, u.login AS author
FROM articles a
INNER JOIN users u ON a.author=u.user_id;

The second schema has to query two tables and perform an inner join,
which is slower than performing a single query to a single table.
Now - what happens if you have two authors named "John Smith"? Your
design just fell apart.

Also, a persons login id may not be the same as their name. And if I'm
authoring an article, I want my name on it, not my userid.

Not a design I would use.
>Sure. But you don't have to use referential integrity. And no,
databases do not just point to another table, as I indicated. Look at
the data in your tables. You will find the char fields.

You will find the char fields if you use SQL. I'm talking about the binary
data that the RDBMS actually keeps in memory while it's running. They use
all kinds of tricks for speed boosts, including using pointers for primary
keys instead of using the real data.
Sure, they use all kinds of speed boosts. But they still have to search
the tables. And they still have to do comparisons on strings.
>Sure. And in this case transactions cause their own problems. You need
at least 4 calls to the database:

START TRANSACTION
SELECT MAX(colId)+1 FROM mytable
INSERT INTO mytable ...
COMMIT

Significantly slower than a single call which returns an auto-numbered
column. And worse yet, it will tie up the table for additional changes
until the COMMIT. Very bad for concurrency.

I'm sure autonumbering results in something fairly similar happening
deep down in the RDBMS anyway.
I think you're making a bad assumption there, Toby. I suspect there is
nothing like that going on "deep down in the database". At least not in
any database I've worked with - which include DB2, Oracle, and SQL
Server in addition to MySQL.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================
Apr 1 '07 #22

P: n/a
Toby A Inkster wrote:
bill wrote:
>One additional question: does the foreach loop terminate because
the query return false ?

It terminates as soon as the query has run out of rows.

It's a PHP 5 feature called "Iterators" -- that is PDO queries are
returned as objects that, although they are not arrays, can be treated as
arrays in a foreach loop. The solution would work equally well using PHP 4
constructs, but the code doesn't look as neat.
THank you,

I am stuck using 4.x because my hosting company does not have
PHP5 available, yet.

What would be the php 4 code ?
Thanks to all (even Jerry, who thinks that we should be
elsewhere) for the great discussion.

bill
Apr 1 '07 #23

P: n/a
Jerry Stuckle wrote:
Now - what happens if you have two authors named "John Smith"? Your
design just fell apart.
Ummm... who said anything about real names -- I was only extracting their
login. Two authors with the same name would have different logins.
Also, a persons login id may not be the same as their name. And if I'm
authoring an article, I want my name on it, not my userid.
This might not be for generating a page. Perhaps articles are kept on
users' own home pages, so you want to generate a link like:

<a href="/~LOGIN/articles/">TITLE</a>
Not a design I would use.
I think we've established that. :-)

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact
Geek of ~ HTML/SQL/Perl/PHP/Python*/Apache/Linux

* = I'm getting there!
Apr 1 '07 #24

P: n/a
Toby A Inkster wrote:
Jerry Stuckle wrote:
>Now - what happens if you have two authors named "John Smith"? Your
design just fell apart.

Ummm... who said anything about real names -- I was only extracting their
login. Two authors with the same name would have different logins.
Yes, and if I'm the author of an article, I want my real name on it, not
some logon id which may or may not be related to my name.
>Also, a persons login id may not be the same as their name. And if I'm
authoring an article, I want my name on it, not my userid.

This might not be for generating a page. Perhaps articles are kept on
users' own home pages, so you want to generate a link like:

<a href="/~LOGIN/articles/">TITLE</a>
OK, and what happens if that page changes? Now you have an out-of-date
link. Or worse yet - someone changes the contents from an article about
PHP to the latest news on broccoli.
>Not a design I would use.

I think we've established that. :-)

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================
Apr 2 '07 #25

P: n/a
bill wrote:
Toby A Inkster wrote:
>bill wrote:
>>One additional question: does the foreach loop terminate because the
query return false ?

It terminates as soon as the query has run out of rows.

It's a PHP 5 feature called "Iterators" -- that is PDO queries are
returned as objects that, although they are not arrays, can be treated as
arrays in a foreach loop. The solution would work equally well using
PHP 4
constructs, but the code doesn't look as neat.

THank you,

I am stuck using 4.x because my hosting company does not have PHP5
available, yet.

What would be the php 4 code ?
Thanks to all (even Jerry, who thinks that we should be elsewhere) for
the great discussion.

bill
Bill,

Well, you're asking how to update a SQL database with SQL...

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================
Apr 2 '07 #26

P: n/a
bill wrote:
I am stuck using 4.x because my hosting company does not have
PHP5 available, yet.
What would be the php 4 code ?
I'm not really familiar with the MySQL functions in PHP, because I use a
decent database; and it's been a while since I upgraded away from PHP 4,
but it should be something like this:

<?php
$db = mysql_connect(/* connection settings */);
mysql_select_db($db, 'some_database');
mysql_query($db, "ALTER TABLE foobar ADD tempcol integer;");
$count = 0;
$rs = mysql_query($db, "SELECT order FROM foobar ORDER BY order;");
while ($row = mysql_fetch_array($rs))
{
$count+=10;
mysql_query($db, sprintf("UPDATE foobar SET tempcol=%d WHERE order=%d;"
$count, $row['order']));
}
mysql_query($db, "UPDATE foobar SET order=tempcol;");
mysql_query($db, "ALTER TABLE foobar DROP tempcol;");
unset($count, $rs);
?>

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact
Geek of ~ HTML/SQL/Perl/PHP/Python*/Apache/Linux

* = I'm getting there!
Apr 2 '07 #27

P: n/a
Toby A Inkster wrote:
bill wrote:
>I am stuck using 4.x because my hosting company does not have
PHP5 available, yet.
What would be the php 4 code ?

I'm not really familiar with the MySQL functions in PHP, because I use a
decent database; and it's been a while since I upgraded away from PHP 4,
but it should be something like this:

<?php
$db = mysql_connect(/* connection settings */);
mysql_select_db($db, 'some_database');
mysql_query($db, "ALTER TABLE foobar ADD tempcol integer;");
$count = 0;
$rs = mysql_query($db, "SELECT order FROM foobar ORDER BY order;");
while ($row = mysql_fetch_array($rs))
{
$count+=10;
mysql_query($db, sprintf("UPDATE foobar SET tempcol=%d WHERE order=%d;"
$count, $row['order']));
}
mysql_query($db, "UPDATE foobar SET order=tempcol;");
mysql_query($db, "ALTER TABLE foobar DROP tempcol;");
unset($count, $rs);
?>
Many thanks - maybe I'll be able to use a "decent database".
I have use DataFlex for many years and am transitioning to SQL
and PHP. This is the first time that I have used a remote server.

what is a decent database to use with PHP, and why ?
bill
Apr 3 '07 #28

P: n/a
bill wrote:
Toby A Inkster wrote:
>bill wrote:
>>I am stuck using 4.x because my hosting company does not have PHP5
available, yet.
What would be the php 4 code ?

I'm not really familiar with the MySQL functions in PHP, because I use a
decent database; and it's been a while since I upgraded away from PHP 4,
but it should be something like this:

<?php
$db = mysql_connect(/* connection settings */);
mysql_select_db($db, 'some_database');
mysql_query($db, "ALTER TABLE foobar ADD tempcol integer;");
$count = 0;
$rs = mysql_query($db, "SELECT order FROM foobar ORDER BY order;");
while ($row = mysql_fetch_array($rs))
{
$count+=10;
mysql_query($db, sprintf("UPDATE foobar SET tempcol=%d WHERE
order=%d;"
$count, $row['order']));
}
mysql_query($db, "UPDATE foobar SET order=tempcol;");
mysql_query($db, "ALTER TABLE foobar DROP tempcol;");
unset($count, $rs);
?>

Many thanks - maybe I'll be able to use a "decent database".
I have use DataFlex for many years and am transitioning to SQL and PHP.
This is the first time that I have used a remote server.

what is a decent database to use with PHP, and why ?
bill
MySQL is a perfectly good database for use with PHP. It doesn't have
every bell and whistle some other databases have - but it's free, open
source and widely supported by hosting companies (more than any other
database).

It's just that some Postgres users are biased.

But if you want a *good* database, I wouldn't even consider Postgres.
I'd go straight to DB2, Oracle or SQL Server. Of course, they'll cost
you a few thousand dollars...

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================
Apr 3 '07 #29

P: n/a
bill wrote:
what is a decent database to use with PHP, and why ?
Of the SQL databases which I've used, I'd put them roughly in this order
of quality:

1. Oracle
=2. PostgreSQL
=2. Microsoft SQL Server
4. MySQL (5.x)
5. Firebird
6. SQLite
7. Microsoft Access

There are plenty of other SQL databases out there of course, but I've not
used them much. Overall, I tend to recommend PostgreSQL because it's free
whereas MS SQL Server and Oracle are both fairly expensive. (They do each
offer a cut-down "express" edition, though this is more suited for a
development environment than full production use.) Also, in my experience
the PHP MS SQL driver can be a bit flaky at times whereas the PostgreSQL
driver seems very reliable.

Nowadays, the gap between PostgreSQL and MySQL is much smaller than it
used to be, but PostgreSQL still offers several advantages over MySQL:

- Good support for foreign keys

- Good support for triggers

- Fuller support for transactions

- More accurate support for SQL standard syntax (e.g. "||"
is the concatenate operator, not an alias for "OR"; double
quotes are used to quote identifiers)

- Server-side functions in a variety of languages (e.g. Python,
Perl, Tcl)

- Boolean data type

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact
Geek of ~ HTML/SQL/Perl/PHP/Python*/Apache/Linux

* = I'm getting there!
Apr 3 '07 #30

P: n/a
Jerry Stuckle wrote:
bill wrote:
>Toby A Inkster wrote:
>>bill wrote:

I am stuck using 4.x because my hosting company does not have PHP5
available, yet.
What would be the php 4 code ?

I'm not really familiar with the MySQL functions in PHP, because I use a
decent database; and it's been a while since I upgraded away from PHP 4,
but it should be something like this:

<?php
$db = mysql_connect(/* connection settings */);
mysql_select_db($db, 'some_database');
mysql_query($db, "ALTER TABLE foobar ADD tempcol integer;");
$count = 0;
$rs = mysql_query($db, "SELECT order FROM foobar ORDER BY order;");
while ($row = mysql_fetch_array($rs))
{
$count+=10;
mysql_query($db, sprintf("UPDATE foobar SET tempcol=%d WHERE
order=%d;"
$count, $row['order']));
}
mysql_query($db, "UPDATE foobar SET order=tempcol;");
mysql_query($db, "ALTER TABLE foobar DROP tempcol;");
unset($count, $rs);
?>

Many thanks - maybe I'll be able to use a "decent database".
I have use DataFlex for many years and am transitioning to SQL and PHP.
This is the first time that I have used a remote server.

what is a decent database to use with PHP, and why ?
bill

MySQL is a perfectly good database for use with PHP. It doesn't have
every bell and whistle some other databases have - but it's free, open
source and widely supported by hosting companies (more than any other
database).

It's just that some Postgres users are biased.
Hey Jerry, watch it!
Postgres users are NOT biased, they just use the best Open Source DB around,
and they know it.

I am developing an app now. I had to use MySQL because the ISP had no real
database around.
I got so darn sick of it, I HAD to switch to Postgres, which was luckily an
option (after some talk and another ISP).

That is not just a personal opinion, I see a LOT of people coming from
Postgres having serious trouble with MySQL and its infantile support for
<you name it>.

In my humble opinion, MySQL is raising a whole generation of DB-users who
only heard of Foreign Keys and boring stuff like 'database integrity'
during college at best, since mySQL has no idea what all that means.
Unless of course you go through the trouble of upgrading that piece of crap
and use INNODB, then MySQL starts behaving like a database should.

I always wondered: Do the builders of mysql know what the 'relational' in
'relational database' actually means?

MySQL is bad, bad, bad.
>
But if you want a *good* database, I wouldn't even consider Postgres.
I would.
I'd go straight to DB2, Oracle or SQL Server. Of course, they'll cost
you a few thousand dollars...
Yes, true. I used Oracle (a little) and a lot of M$ SQL Server. Both are
really good.

Sorry for the rant.
I am just fed up with MySQL.

Regards,
Erwin Moller
Apr 3 '07 #31

P: n/a
Toby A Inkster wrote:
bill wrote:
>what is a decent database to use with PHP, and why ?

Of the SQL databases which I've used, I'd put them roughly in this order
of quality:

1. Oracle
=2. PostgreSQL
=2. Microsoft SQL Server
4. MySQL (5.x)
5. Firebird
6. SQLite
7. Microsoft Access

There are plenty of other SQL databases out there of course, but I've not
used them much. Overall, I tend to recommend PostgreSQL because it's free
whereas MS SQL Server and Oracle are both fairly expensive. (They do each
offer a cut-down "express" edition, though this is more suited for a
development environment than full production use.) Also, in my experience
the PHP MS SQL driver can be a bit flaky at times whereas the PostgreSQL
driver seems very reliable.

Nowadays, the gap between PostgreSQL and MySQL is much smaller than it
used to be, but PostgreSQL still offers several advantages over MySQL:

- Good support for foreign keys

- Good support for triggers

- Fuller support for transactions

- More accurate support for SQL standard syntax (e.g. "||"
is the concatenate operator, not an alias for "OR"; double
quotes are used to quote identifiers)

- Server-side functions in a variety of languages (e.g. Python,
Perl, Tcl)

- Boolean data type
:-)
I soooo second that opinion.
Go Postgres, and never look back.
It even works on W$ OS (W2000 and up) nowadays.

Regards,
Erwin Moller

Apr 3 '07 #32

P: n/a
Erwin Moller wrote:
Jerry Stuckle wrote:
>bill wrote:
>>Toby A Inkster wrote:
bill wrote:

I am stuck using 4.x because my hosting company does not have PHP5
available, yet.
What would be the php 4 code ?
I'm not really familiar with the MySQL functions in PHP, because I use a
decent database; and it's been a while since I upgraded away from PHP 4,
but it should be something like this:

<?php
$db = mysql_connect(/* connection settings */);
mysql_select_db($db, 'some_database');
mysql_query($db, "ALTER TABLE foobar ADD tempcol integer;");
$count = 0;
$rs = mysql_query($db, "SELECT order FROM foobar ORDER BY order;");
while ($row = mysql_fetch_array($rs))
{
$count+=10;
mysql_query($db, sprintf("UPDATE foobar SET tempcol=%d WHERE
order=%d;"
$count, $row['order']));
}
mysql_query($db, "UPDATE foobar SET order=tempcol;");
mysql_query($db, "ALTER TABLE foobar DROP tempcol;");
unset($count, $rs);
?>

Many thanks - maybe I'll be able to use a "decent database".
I have use DataFlex for many years and am transitioning to SQL and PHP.
This is the first time that I have used a remote server.

what is a decent database to use with PHP, and why ?
bill
MySQL is a perfectly good database for use with PHP. It doesn't have
every bell and whistle some other databases have - but it's free, open
source and widely supported by hosting companies (more than any other
database).

It's just that some Postgres users are biased.

Hey Jerry, watch it!
Postgres users are NOT biased, they just use the best Open Source DB around,
and they know it.

I am developing an app now. I had to use MySQL because the ISP had no real
database around.
I got so darn sick of it, I HAD to switch to Postgres, which was luckily an
option (after some talk and another ISP).

That is not just a personal opinion, I see a LOT of people coming from
Postgres having serious trouble with MySQL and its infantile support for
<you name it>.

In my humble opinion, MySQL is raising a whole generation of DB-users who
only heard of Foreign Keys and boring stuff like 'database integrity'
during college at best, since mySQL has no idea what all that means.
Unless of course you go through the trouble of upgrading that piece of crap
and use INNODB, then MySQL starts behaving like a database should.

I always wondered: Do the builders of mysql know what the 'relational' in
'relational database' actually means?

MySQL is bad, bad, bad.
>But if you want a *good* database, I wouldn't even consider Postgres.

I would.
>I'd go straight to DB2, Oracle or SQL Server. Of course, they'll cost
you a few thousand dollars...

Yes, true. I used Oracle (a little) and a lot of M$ SQL Server. Both are
really good.

Sorry for the rant.
I am just fed up with MySQL.

Regards,
Erwin Moller
Erwin,

Yes, I've used Postgres in the past also, and it's a decent database.
But so is MySQL. I have no problem using InnoDB tables - they work
fine, and no "upgrade" necessary - just specify you want to use InnoDB
when you create the table (or alter it later).

And I've use Oracle, SQL Server and DB2 in the past. All are excellent,
albeit expensive, databases. Neither PostGres nor MySQL hold a candle
to any of them.

But MySQL is still the most popular DB around, and there are tons of
applications out there which only use MySQL - for good reason. Because
it is good.

No, it doesn't have all the bells and whistles of PostGres. But many
web applications don't need them. And for those which do, MySQL is
rapidly catching up.

I'm not against PostGres. But I recommend to my customers MySQL because
it is popular, it is well supported, and it is installed on the majority
of the hosting services around.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================
Apr 3 '07 #33

P: n/a
Erwin Moller wrote:
Toby A Inkster wrote:
>bill wrote:
>>what is a decent database to use with PHP, and why ?
Of the SQL databases which I've used, I'd put them roughly in this order
of quality:

1. Oracle
=2. PostgreSQL
=2. Microsoft SQL Server
4. MySQL (5.x)
5. Firebird
6. SQLite
7. Microsoft Access

There are plenty of other SQL databases out there of course, but I've not
used them much. Overall, I tend to recommend PostgreSQL because it's free
whereas MS SQL Server and Oracle are both fairly expensive. (They do each
offer a cut-down "express" edition, though this is more suited for a
development environment than full production use.) Also, in my experience
the PHP MS SQL driver can be a bit flaky at times whereas the PostgreSQL
driver seems very reliable.

Nowadays, the gap between PostgreSQL and MySQL is much smaller than it
used to be, but PostgreSQL still offers several advantages over MySQL:

- Good support for foreign keys

- Good support for triggers

- Fuller support for transactions

- More accurate support for SQL standard syntax (e.g. "||"
is the concatenate operator, not an alias for "OR"; double
quotes are used to quote identifiers)

- Server-side functions in a variety of languages (e.g. Python,
Perl, Tcl)

- Boolean data type

:-)
I soooo second that opinion.
Go Postgres, and never look back.
It even works on W$ OS (W2000 and up) nowadays.

Regards,
Erwin Moller
Thank you all !

bill
Apr 4 '07 #34

This discussion thread is closed

Replies have been disabled for this discussion.