My question is about a UNION query to deal with an (annoying) JOIN
over
two tables. I am joining over a double column primary key (where the
order of the columns can be changed). This is sooo slooow using a
join, but very fast using a union. How come this is?
Here is more detail...
I have two tables of 'associations' between pairs of objects.
An example table looks like this (and both have the same format) ...
OBJECT_1 | OBJECT_2 | ASSOCIATION_PARAMETER_1 |
ASSOCIATION_PARAMETER_2
A | B | w | x
B | C | y | z
....
PK = (A,B)
I want to join two tables with the above format. The problem is that
one
table my have ...
TABLE_1,
OBJECT_1 = A
OBJECT_2 = B
And the other my have
TABLE_2,
OBJECT_1 = B
OBJECT_2 = A
Please beleive that I have no alternative but to put up with this
situation.
So, I do my join like this (using a sub-query)
SELECT
*
FROM
TABLE_1
INNER JOIN
(
SELECT
OBJECT_1,
OBJECT_2,
ASSOCIATION_PARAMETER_1,
ASSOCIATION_PARAMETER_2
FROM
TABLE_2
#
UNION
#
SELECT
OBJECT_2,
OBJECT_1,
ASSOCIATION_PARAMETER_1,
ASSOCIATION_PARAMETER_2
FROM
TABLE_2
) AS x
USING
(OBJECT_1, OBJECT_2)
;
This executes very fast, creating a 'virtual' table called x, which
combines all the unique rows of the UNION. NB: OBJECT_1 is never equal
to OBJECT_2 in any one row of any table.
The virtual table contains all (distinct) possible orderings of the
pairing OBJECT_1 - OBJECT_2, and the INNER JOIN works only for rows
with
the correct (matching) order of objects.
The above works fine, and is very fast (my primary key is (OBJECT_1,
OBJECT2), and I have an additional index on OBJECT_2).
The above is *bulky*, and when the individual queryies get more
complex it
is a pain to keep the two parts (the two halves of the UNION) 'in
sync' -
identical that is.
However, the following query is *very* slow...
SELECT
*
FROM
TABLE_1
INNER JOIN
TABLE_2
ON
(TABLE_1.OBJECT_1 = TABLE_2.OBJECT_1 AND
TABLE_1.OBJECT_2 = TABLE_2.OBJECT_2) OR
(TABLE_1.OBJECT_1 = TABLE_2.OBJECT_2 AND
TABLE_1.OBJECT_2 = TABLE_2.OBJECT_1)
;
I wan't to know why the first query is very fast, and the second
(which is
more compact and easier to read in my opinion) is soooo slooooow...
Is this an optimizer bug? Should I submit this as a bug?
I also tried this... (again messy)...
SELECT
IF(x.OBJECT_1<x.OBJECT_2,x.OBJECT_1,x.OBJECT_2) AS OBJECT_1,
IF(x.OBJECT_1>x.OBJECT_2,x.OBJECT_1,x.OBJECT_2) AS OBJECT_2,
x.ASSOCIATION_PARAMETER_1,
x.ASSOCIATION_PARAMETER_2,
y.ASSOCIATION_PARAMETER_1,
y.ASSOCIATION_PARAMETER_2,
FROM
TABLE_1 x
INNER JOIN
TABLE_2 y
ON
IF(x.OBJECT_1<x.OBJECT_2,x.OBJECT_1,x.OBJECT_2) =
IF(y.OBJECT_1<y.OBJECT_2,y.OBJECT_1,y.OBJECT_2)
AND
IF(x.OBJECT_1>x.OBJECT_2,x.OBJECT_1,x.OBJECT_2) =
IF(y.OBJECT_1>y.OBJECT_2,y.OBJECT_1,y.OBJECT_2)
;
But this isn't fast either (EXPLAIN looks the same as with the above,
and it runs in about the same time).
Why am I such a fool for SQL? Sorry for wasting your time. All
feedback
welcome,
Dan.