"Mark A" <ma@switchboard .net> writes:
I don't know what you mean by parallelism of joins and individual buckets.
But Teradata in the mid 1980's worked pretty much the same way DB2 does
today (at a conceptual level). The table was spread across multiple
partitions based on a hash key, and each partition processed the data in
parallel. Cross partition joins were supported.
When you decluster relations across nodes by, say, hash partitioning,
then to join to relations you need to join the corresponding fragments
if they are both hashed on the join attributes in the declustering.
Here I'm using the term fragment for the set of tuples for a relation
stored at a single node after declustering.
You can further parallize the join of a pair of fragments by using
a hash-join algorithm, but using a different hash function than
was used to decluster the relations. This further partitions each
fragment into buckets that can be joined independently of each other.
Each of these bucket joins can be started on separate processors
enable the steps of a hash join of fragments to be computed in parallel.
The initial declustering of relations enables I/O parallelism, but
the hashing with a separate function to compute hash joins of each
pair of fragments enables CPU parallism. Teradata did not support this
in the mid 1980s. It did use a sort-merge join algorithm in which each
fragment can be partitioned into smaller pieces to be sorted in parallel,
but the merge operation is single-threaded so this join algorithm is not
as parallelizable as a hash-join.
The first GAMMA paper describing the system was published and presented at
the VLDB 1986 conference, meaning the paper was submitted in fall 1985. It
would have been a working system by then, so I'd say the shared-nothing
parallel DB architecture was independently developed by Teradata and
DeWitt's research group, with the latter more highly developed
technologically . Of course, Teradata had to spend more time getting
the code to production standards, whereas a research can just do a prototype
as proof of concept.
Cheers,
Joseph