469,918 Members | 2,256 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,918 developers. It's quick & easy.

UNION and optimization?

My question is about a UNION query to deal with an (annoying) JOIN
over
two tables. I am joining over a double column primary key (where the
order of the columns can be changed). This is sooo slooow using a
join, but very fast using a union. How come this is?

Here is more detail...

I have two tables of 'associations' between pairs of objects.

An example table looks like this (and both have the same format) ...

OBJECT_1 | OBJECT_2 | ASSOCIATION_PARAMETER_1 |
ASSOCIATION_PARAMETER_2
A | B | w | x
B | C | y | z
....

PK = (A,B)

I want to join two tables with the above format. The problem is that
one
table my have ...

TABLE_1,
OBJECT_1 = A
OBJECT_2 = B
And the other my have

TABLE_2,
OBJECT_1 = B
OBJECT_2 = A
Please beleive that I have no alternative but to put up with this
situation.
So, I do my join like this (using a sub-query)

SELECT
*
FROM
TABLE_1
INNER JOIN
(
SELECT
OBJECT_1,
OBJECT_2,
ASSOCIATION_PARAMETER_1,
ASSOCIATION_PARAMETER_2
FROM
TABLE_2
#
UNION
#
SELECT
OBJECT_2,
OBJECT_1,
ASSOCIATION_PARAMETER_1,
ASSOCIATION_PARAMETER_2
FROM
TABLE_2
) AS x
USING
(OBJECT_1, OBJECT_2)
;

This executes very fast, creating a 'virtual' table called x, which
combines all the unique rows of the UNION. NB: OBJECT_1 is never equal
to OBJECT_2 in any one row of any table.

The virtual table contains all (distinct) possible orderings of the
pairing OBJECT_1 - OBJECT_2, and the INNER JOIN works only for rows
with
the correct (matching) order of objects.

The above works fine, and is very fast (my primary key is (OBJECT_1,
OBJECT2), and I have an additional index on OBJECT_2).

The above is *bulky*, and when the individual queryies get more
complex it
is a pain to keep the two parts (the two halves of the UNION) 'in
sync' -
identical that is.

However, the following query is *very* slow...

SELECT
*
FROM
TABLE_1
INNER JOIN
TABLE_2
ON
(TABLE_1.OBJECT_1 = TABLE_2.OBJECT_1 AND
TABLE_1.OBJECT_2 = TABLE_2.OBJECT_2) OR
(TABLE_1.OBJECT_1 = TABLE_2.OBJECT_2 AND
TABLE_1.OBJECT_2 = TABLE_2.OBJECT_1)
;
I wan't to know why the first query is very fast, and the second
(which is
more compact and easier to read in my opinion) is soooo slooooow...

Is this an optimizer bug? Should I submit this as a bug?
I also tried this... (again messy)...

SELECT
IF(x.OBJECT_1<x.OBJECT_2,x.OBJECT_1,x.OBJECT_2) AS OBJECT_1,
IF(x.OBJECT_1>x.OBJECT_2,x.OBJECT_1,x.OBJECT_2) AS OBJECT_2,
x.ASSOCIATION_PARAMETER_1,
x.ASSOCIATION_PARAMETER_2,
y.ASSOCIATION_PARAMETER_1,
y.ASSOCIATION_PARAMETER_2,
FROM
TABLE_1 x
INNER JOIN
TABLE_2 y
ON
IF(x.OBJECT_1<x.OBJECT_2,x.OBJECT_1,x.OBJECT_2) =
IF(y.OBJECT_1<y.OBJECT_2,y.OBJECT_1,y.OBJECT_2)
AND
IF(x.OBJECT_1>x.OBJECT_2,x.OBJECT_1,x.OBJECT_2) =
IF(y.OBJECT_1>y.OBJECT_2,y.OBJECT_1,y.OBJECT_2)
;

But this isn't fast either (EXPLAIN looks the same as with the above,
and it runs in about the same time).

Why am I such a fool for SQL? Sorry for wasting your time. All
feedback
welcome,

Dan.
Jul 20 '05 #1
1 1830
dmb000006 wrote:
I want to join two tables with the above format. The problem is that
one table may have ...

TABLE_1,
OBJECT_1 = A
OBJECT_2 = B

And the other may have

TABLE_2,
OBJECT_1 = B
OBJECT_2 = A


Make sure you have indexes on OBJECT_1 and OBJECT_2 in both tables.
Even if you have a primary key index defined over the two columns, try
putting an additional index on just the OBJECT_2 column.

I'm thinking that the slowness is due to an unindexed search through the
OBJECT_2 columns. This might be a case that is a little too obscure for
the optimizer to know that it can benefit from the compound index.

Also, here's another possible design to make the query more compact,
similar to your IF() technique:

SELECT LEAST(x.OBJECT_1, x.OBJECT_2) AS OBJECT_1,
GREATEST(x.OBJECT_1, x.OBJECT_2) AS OBJECT_2
FROM TABLE_1 x INNER JOIN TABLE_2 y ON
(LEAST(x.OBJECT_1, x.OBJECT_2) = LEAST(y.OBJECT_1, y.OBJECT_2) AND
GREATEST(x.OBJECT_1, x.OBJECT_2) = GREATEST(y.OBJECT_1, y.OBJECT_2))

Regards,
Bill K.
Jul 20 '05 #2

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by Bryan Parkoff | last post: by
6 posts views Thread by Eugene | last post: by
8 posts views Thread by lyn.duong | last post: by
13 posts views Thread by Razmig K | last post: by
84 posts views Thread by Peter Olcott | last post: by
2 posts views Thread by jafastinger | last post: by
29 posts views Thread by Richard Harter | last post: by
18 posts views Thread by Bryan Parkoff | last post: by
1 post views Thread by Waqarahmed | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.