"Geoff Berrow" <bl******@ckdog.co.uk> wrote in message
news:tf********************************@4ax.com...
I noticed that Message-ID: <c2*************@ID-30799.news.uni-berlin.de>
from Agelmar contained the following:
As for ordering records on a disk
using a hashing algorithm - I've never seen that before... I don't think
youwould ever want to do that, either. Ordering on the attributes that are a
function of the hash should have the same effect, and still provide some
utility for range predicates. The entire purpose of physically storing
thetuples in a sorted order is so that when you retrieve a set of tuples
with aparticular (range of) values, you perform sequential I/O, issuing one
request to grab all the blocks, rather than one request per tuple and
thrashing the disk.
P.s. this is my area of research :-) (DBMSs)
I could tell, the only other person I ever heard call them tuples was my
university lecturer. I'll have to look out my old notes, but the
hashing stuff sticks in my memory. Maybe I misunderstood it. I did
find this on the 'net though:-
File Organization Techniques
Hashing - Each record is placed in the database at a location whose
address is computed as some function (hash function) of that record
(hash field). To store the record, the DBMS computes the hash address
for the new record and instructs the file manager to place the record at
that position
I worked on a database system many years ago on the HP3000 minicomputer
called IMAGE (later called TurboIMAGE) which used a hashing algorithm to
locate records in a table. This algorithm used only the primary key, not the
whole record, and produced a record number or address. As each record in a
table was fixed to the same length (variable length fields were not
supported, and empty fields were filled with null values) each record number
could be used to work out a physical address on the disk. A capacity was
defined for each table so that it could assign the space necessary to hold
that number of records. The capacity value could be changed, but this meant
rebuilding the table.
There were two approaches in this hashing algorithm:
(a) If the primary key was numeric and the key value was less than the
capacity of the table then the key values was used as the address.
Consequently each record had a fixed location in the table, and a sequential
read would retrieve the records in primary key sequence.
(b) If the primary key was numeric but greater in value than the capacity of
the table, or the primary key was not numeric, then the hashing algorithm
would produce an address somewhere else in the table space, and this address
would be used instead of the primary key value. The output from the hashing
algorithm was known as the "primary address".
(c) It was possible that different primary key values could hash to the same
"primary address", in which case the first record at that address was
allowed to stay at that address, but all others were moved to the nearest
available empty slots, known as a "secondary address". A chain of pointers
to these "secondary addresses" was maintained at the primary address.
(d) When performing a lookup using a primary key the hashing algorithm would
compute a primary address, and the DBMS would read the record at that
address. If the primary key of that record did not match the given primary
key then the DBMS would read down the chain of secondaries until it found
it, or would return "key not found".
(e) A problem arose when inserting a record whose primary key hashed to an
address which was already taken up by a "secondary". In this case the
secondary record would have to be moved to a different empty slot so that
the space could be used by the primary record, and the chain of secondary
addresses would have to be updated to reflect this movement. The insertion
of records could therefore be slowed down by this movement of secondary
records, which was given the term "migrating secondaries".
The whole point of this is to say that unless you know the inner workings of
your DBMS you cannot predict where each record will be written in the table
space, therefore you cannot predict the order in which records will be
presented to you when doing a sequential read. The only thing that you can
predict is that, barring deletions and insertions, the order will be exactly
the same whenever you perform another sequential read.
Here endeth the lesson.
--
Tony Marston
http://www.tonymarston.net