By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,842 Members | 2,368 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,842 IT Pros & Developers. It's quick & easy.

Multicolumn correlated subquery?

P: n/a
Hi,

I have a history table with about 400 million rows, with a unique
composite nonclustered index on two columns (object id and time period)
which is used for most of the queries into the table, and an identity
column for the clustered primary key.

Many of my queries use correlated subqueries to pull unique history
rows from the history table for each of a set of objects from the
object table, for instance, pulling the earliest history row for each
object in a set. These correlated subqueries reference the object table
and return the primary key of the history table, e.g.:

select *
from lp_object l
inner join lp_object_history h
on h.lp_object_id = l.lp_id

where l.lp_set_id = 'SOME_LITERAL'

and h.lp_id = (
select top 1 lp_id
from lp_object_history
where lp_object_id = l.lp_id
and lp_some_column > 0
order by lp_time_period)

Now, if lp_some_column is not indexed, this query has no choice but to
read the entirety of every single history row for every object in the
set where lp_set_id = 'SOME_LITERAL', so that it can determine if
lp_some_column > 0, and because the history table is clustered by the
identity column rather than the ID of the relevant object whose history
we're tracking, the reads take forever - they have to bop all around
the disk. The sets I deal with tend to have about 5K objects in them
and about 200K associated history rows.

I'm considering reclustering by the (object id, time period) index, but
then my queries will need an extra bookmark lookup step to get the row
data from the identity value returned by the correlated subquery. I
think it will still be faster, though, so I will probably build a copy
of the table with the alternative clustering scheme to run some
performance tests.

What I'm wondering is, if I were to dispense with the identity column
altogether and replace it with a composite primary key of (object id,
time period), would I be still be able to use my correlated subqueries?
Because then there wouldn't be a single column that uniquely identifies
each row in the history table and I don't think SQL Server supports
multicolumn correlated subqueries.

Thanks for reading,
Seth

May 10 '06 #1
Share this Question
Share on Google+
4 Replies


P: n/a
sql_server_user (ka*******@gmail.com) writes:
Many of my queries use correlated subqueries to pull unique history
rows from the history table for each of a set of objects from the
object table, for instance, pulling the earliest history row for each
object in a set. These correlated subqueries reference the object table
and return the primary key of the history table, e.g.:

select *
from lp_object l
inner join lp_object_history h
on h.lp_object_id = l.lp_id

where l.lp_set_id = 'SOME_LITERAL'

and h.lp_id = (
select top 1 lp_id
from lp_object_history
where lp_object_id = l.lp_id
and lp_some_column > 0
order by lp_time_period)
...
I'm considering reclustering by the (object id, time period) index, but
then my queries will need an extra bookmark lookup step to get the row
data from the identity value returned by the correlated subquery. I
think it will still be faster, though, so I will probably build a copy
of the table with the alternative clustering scheme to run some
performance tests.

What I'm wondering is, if I were to dispense with the identity column
altogether and replace it with a composite primary key of (object id,
time period), would I be still be able to use my correlated subqueries?
Because then there wouldn't be a single column that uniquely identifies
each row in the history table and I don't think SQL Server supports
multicolumn correlated subqueries.


Unless I'm missing something, the query without the IDENTITY column would
be:

select *
from lp_object l
inner join lp_object_history h on h.lp_object_id = l.lp_id
where l.lp_set_id = 'SOME_LITERAL'
and h.lp_time_period = (select max(h2.lp_time_period)
from lp_object_history h2
where h2.lp_object_id = l.lp_id
and h2.lp_some_column > 0)

Whether this will actually perform that much better I don't know, but
I can't see anything good coming through that IDENTITY column.

You may also want to try the effect of changíng

where h2.lp_object_id = l.lp_id

to

where h2.lp_object_id = h.lp_object_id

or even

where h2.lp_object_id = l.lp_id
and h2.lp_object_id = h.lp_object_id

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
May 10 '06 #2

P: n/a
Thanks Erland.

The query as you've rewritten it is right on, as long as we have the
object id specified in the outer part of the query so that the history
row is fully specified when given the time period from the subquery. I
guess what I was concerned about was correlated subqueries where this
isn't the case, but I can't think of any actual examples for this
application, so maybe I don't have to worry about it.

I will definitely try the other permutations of the query you
suggested.

Seth

May 10 '06 #3

P: n/a
I reclustered the table and the query is of course much faster now. I
need to keep both versions of the table around for a while to see how
they compare for all the queries we run, so I have an insert trigger
keeping them in sync. I tried the other permutations of the query you
suggested, Erland, and they had no effect on the execution plan or
performance.

Thanks,
Seth

May 11 '06 #4

P: n/a
sql_server_user (ka*******@gmail.com) writes:
I reclustered the table and the query is of course much faster now. I
need to keep both versions of the table around for a while to see how
they compare for all the queries we run, so I have an insert trigger
keeping them in sync.
Good to hear that it worked out.
I tried the other permutations of the query you suggested, Erland, and
they had no effect on the execution plan or performance.


Good! This indicates that the query might be stable. :-) I've experienced
situations where such "meaningless" permuation affected the query plan,
so I wanted you to try all three.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
May 11 '06 #5

This discussion thread is closed

Replies have been disabled for this discussion.