By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,496 Members | 1,527 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,496 IT Pros & Developers. It's quick & easy.

Outer Join - shifting result set.

P: n/a
I'm a quantitative securities analyst working with Compustat data
(company fiscal reports and pricing feeds).

My coworker came across a problem that we fixed, but I'd like to
understand 'why' it was happening and just don't get it yet.

Here's the starting query (reduced to simple prefixes):
----INITIAL-----

declare @mthDate datetime
set @mthDate = (select max(datadate) from t)
declare @wkDate datetime
set @wkDate = (select max(datadate) from z)

Select
...
from
z
left join a on a.idA = z.idA and a.idB = z.idB
and a.datadate = z.datadate
left join b on b.idA = z.idA and b.idB = z.idB
and b.datadate = @mthDate
left join c on c.idA = z.idA and c.idB = z.idB
and c.datadate = @mthDate
left join d on d.idA = z.idA and d.idB = z.idB
and d.datadate = z.datadate
left join e on e.idA = z.idA
and e.datadate = @mthDate
left join f on f.idA = e.idA and f.datadate=e.date2
left join g on g.idA = e.idA and g.datadate=e.date2
left join h on h.idA = z.idA
left join k on k.ticker = z.ticker
left join m on m.idA = z.idA and m.idB=z.idB
where
z.datadate = @wkDate
<..some other expression filters...>
and k.ticker is null

----END INITIAL-----------

As you can see 'z' is the main table that things are linked to via
outer joins (our security master). Table 'k' has a list of securities
that we wish not to have results for.

There are 77 entries in table k and 4933 in table z for that given
time. We'd expect 4856 to be in this, but no. it's 4400, and then the
next time you run it (no changes whatsover) it's 2312, and so on.
Every time you execute you get a different record count.

My thought/and fix was to move the (k.ticker) predicate out of the
where clause and get a differenced set from z using NOT EXISTS:
-----AMENDED---------------
from
(z where not exists(select * from k where k.ticker=y.ticker)) y
left join a on a.idA = y.idA and a.idB = y.idB
and a.datadate = y.datadate
left join b on b.idA = y.idA and b.idB = y.idB
and b.datadate = @mthDate
left join c on c.idA = y.idA and c.idB = y.idB
and c.datadate = @mthDate
left join d on d.idA = y.idA and d.idB = y.idB
and d.datadate = y.datadate
left join e on e.idA = y.idA
and e.datadate = @mthDate
left join f on f.idA = e.idA and f.datadate=e.date2
left join g on g.idA = e.idA and g.datadate=e.date2
left join h on h.idA = y.idA
left join k on k.ticker = y.ticker
left join m on m.idA = y.idA and m.idB=y.idB
where
y.datadate = @wkDate
<..some other expression filters...>

------------------------

And this works. It's stable now.

I'm hoping someone here can help me up the wisdom curve by explaining
to me 'why' the recordset kept changing before.

My guess is that the cost-based optimizer was resorting the outer joins
and handing back different sets as a result, but i want to understand,
and thought i'd come to this group for help.

I appreciate your time and look forward to replies.

Greg McIntire

Jul 23 '05 #1
Share this Question
Share on Google+
7 Replies


P: n/a
Greg,

it sounds like a bug to me. If the data doesn't change, then the query
result shouldn't change.

What version and service pack are you running? Can you create a
(simplified?) script that reproceduces the problem?

Gert-Jan
Greg wrote:

I'm a quantitative securities analyst working with Compustat data
(company fiscal reports and pricing feeds).

My coworker came across a problem that we fixed, but I'd like to
understand 'why' it was happening and just don't get it yet.

Here's the starting query (reduced to simple prefixes):

----INITIAL-----

declare @mthDate datetime
set @mthDate = (select max(datadate) from t)
declare @wkDate datetime
set @wkDate = (select max(datadate) from z)

Select
...
from
z
left join a on a.idA = z.idA and a.idB = z.idB
and a.datadate = z.datadate
left join b on b.idA = z.idA and b.idB = z.idB
and b.datadate = @mthDate
left join c on c.idA = z.idA and c.idB = z.idB
and c.datadate = @mthDate
left join d on d.idA = z.idA and d.idB = z.idB
and d.datadate = z.datadate
left join e on e.idA = z.idA
and e.datadate = @mthDate
left join f on f.idA = e.idA and f.datadate=e.date2
left join g on g.idA = e.idA and g.datadate=e.date2
left join h on h.idA = z.idA
left join k on k.ticker = z.ticker
left join m on m.idA = z.idA and m.idB=z.idB
where
z.datadate = @wkDate
<..some other expression filters...>
and k.ticker is null

----END INITIAL-----------

As you can see 'z' is the main table that things are linked to via
outer joins (our security master). Table 'k' has a list of securities
that we wish not to have results for.

There are 77 entries in table k and 4933 in table z for that given
time. We'd expect 4856 to be in this, but no. it's 4400, and then the
next time you run it (no changes whatsover) it's 2312, and so on.
Every time you execute you get a different record count.

My thought/and fix was to move the (k.ticker) predicate out of the
where clause and get a differenced set from z using NOT EXISTS:

-----AMENDED---------------
from
(z where not exists(select * from k where k.ticker=y.ticker)) y
left join a on a.idA = y.idA and a.idB = y.idB
and a.datadate = y.datadate
left join b on b.idA = y.idA and b.idB = y.idB
and b.datadate = @mthDate
left join c on c.idA = y.idA and c.idB = y.idB
and c.datadate = @mthDate
left join d on d.idA = y.idA and d.idB = y.idB
and d.datadate = y.datadate
left join e on e.idA = y.idA
and e.datadate = @mthDate
left join f on f.idA = e.idA and f.datadate=e.date2
left join g on g.idA = e.idA and g.datadate=e.date2
left join h on h.idA = y.idA
left join k on k.ticker = y.ticker
left join m on m.idA = y.idA and m.idB=y.idB
where
y.datadate = @wkDate
<..some other expression filters...>

------------------------

And this works. It's stable now.

I'm hoping someone here can help me up the wisdom curve by explaining
to me 'why' the recordset kept changing before.

My guess is that the cost-based optimizer was resorting the outer joins
and handing back different sets as a result, but i want to understand,
and thought i'd come to this group for help.

I appreciate your time and look forward to replies.

Greg McIntire

Jul 23 '05 #2

P: n/a
It's Version 8.00.194

While my ego would like to say it's not me, it's Microsoft, I'm pretty
doubtful. I think it's much more likely that it's a mental mistake....

Greg

Jul 23 '05 #3

P: n/a
Well, could be, but I would install a service pack anyway. 8.00.194
means you have not installed any SQL-Server service pack, which
basically means you are missing all bug fixes of the last 5 years...

Gert-Jan
Greg wrote:

It's Version 8.00.194

While my ego would like to say it's not me, it's Microsoft, I'm pretty
doubtful. I think it's much more likely that it's a mental mistake....

Greg

Jul 23 '05 #4

P: n/a
alright will talk to our tech guy. thanks a bunch.

Greg

Jul 23 '05 #5

P: n/a
Greg (ja******@hotmail.com) writes:
Select
...
from
z
left join a on a.idA = z.idA and a.idB = z.idB
and a.datadate = z.datadate
left join b on b.idA = z.idA and b.idB = z.idB
and b.datadate = @mthDate
left join c on c.idA = z.idA and c.idB = z.idB
and c.datadate = @mthDate
left join d on d.idA = z.idA and d.idB = z.idB
and d.datadate = z.datadate
left join e on e.idA = z.idA
and e.datadate = @mthDate
left join f on f.idA = e.idA and f.datadate=e.date2
left join g on g.idA = e.idA and g.datadate=e.date2
left join h on h.idA = z.idA
left join k on k.ticker = z.ticker
left join m on m.idA = z.idA and m.idB=z.idB
where
z.datadate = @wkDate
<..some other expression filters...>
and k.ticker is null

The part with e, f and g looks suspicious to me. Not if I can tell
whether they are the cause of your SELECT:s returning a different
number. But you might want to have said:

left join (e
join f on f.idA = e.idA and f.datadate=e.date2
join g on g.idA = e.idA and g.datadate=e.date2)
on e.idA = z.idA
and e.datadate = @mthDate
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 23 '05 #6

P: n/a

"Greg" <ja******@hotmail.com> wrote in message
news:11**********************@l41g2000cwc.googlegr oups.com...
I'm a quantitative securities analyst working with Compustat data
(company fiscal reports and pricing feeds).

My coworker came across a problem that we fixed, but I'd like to
understand 'why' it was happening and just don't get it yet.

Here's the starting query (reduced to simple prefixes):
----INITIAL-----

declare @mthDate datetime
set @mthDate = (select max(datadate) from t)
declare @wkDate datetime
set @wkDate = (select max(datadate) from z)

Select
...
from
z
left join a on a.idA = z.idA and a.idB = z.idB
and a.datadate = z.datadate
left join b on b.idA = z.idA and b.idB = z.idB
and b.datadate = @mthDate
left join c on c.idA = z.idA and c.idB = z.idB
and c.datadate = @mthDate
left join d on d.idA = z.idA and d.idB = z.idB
and d.datadate = z.datadate
left join e on e.idA = z.idA
and e.datadate = @mthDate
left join f on f.idA = e.idA and f.datadate=e.date2
left join g on g.idA = e.idA and g.datadate=e.date2
left join h on h.idA = z.idA
left join k on k.ticker = z.ticker
left join m on m.idA = z.idA and m.idB=z.idB
where
z.datadate = @wkDate
<..some other expression filters...>
and k.ticker is null

----END INITIAL-----------

As you can see 'z' is the main table that things are linked to via
outer joins (our security master). Table 'k' has a list of securities
that we wish not to have results for.

There are 77 entries in table k and 4933 in table z for that given
time. We'd expect 4856 to be in this, but no. it's 4400, and then the
next time you run it (no changes whatsover) it's 2312, and so on.
Every time you execute you get a different record count.

My thought/and fix was to move the (k.ticker) predicate out of the
where clause and get a differenced set from z using NOT EXISTS:
-----AMENDED---------------
from
(z where not exists(select * from k where k.ticker=y.ticker)) y
left join a on a.idA = y.idA and a.idB = y.idB
and a.datadate = y.datadate
left join b on b.idA = y.idA and b.idB = y.idB
and b.datadate = @mthDate
left join c on c.idA = y.idA and c.idB = y.idB
and c.datadate = @mthDate
left join d on d.idA = y.idA and d.idB = y.idB
and d.datadate = y.datadate
left join e on e.idA = y.idA
and e.datadate = @mthDate
left join f on f.idA = e.idA and f.datadate=e.date2
left join g on g.idA = e.idA and g.datadate=e.date2
left join h on h.idA = y.idA
left join k on k.ticker = y.ticker
left join m on m.idA = y.idA and m.idB=y.idB
where
y.datadate = @wkDate
<..some other expression filters...>

------------------------

And this works. It's stable now.

I'm hoping someone here can help me up the wisdom curve by explaining
to me 'why' the recordset kept changing before.

My guess is that the cost-based optimizer was resorting the outer joins
and handing back different sets as a result, but i want to understand,
and thought i'd come to this group for help.

I appreciate your time and look forward to replies.

Greg McIntire


Your guess sounds pretty reasonable to me.

If you use the show query plan setting in Query Analyzer, you can look and
see if the plan selected changes from execution to execution.

As a general rule ... I would take the variable comparison filtering out of
the joining and into the where clause.

Jul 23 '05 #7

P: n/a
> As a general rule ... I would take the variable comparison filtering out of
the joining and into the where clause.


You can't do that with outer joins, because that changes the meaning
(i.e. the resultset).

Gert-Jan
Jul 23 '05 #8

This discussion thread is closed

Replies have been disabled for this discussion.