473,320 Members | 1,952 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Outer Join - shifting result set.

I'm a quantitative securities analyst working with Compustat data
(company fiscal reports and pricing feeds).

My coworker came across a problem that we fixed, but I'd like to
understand 'why' it was happening and just don't get it yet.

Here's the starting query (reduced to simple prefixes):
----INITIAL-----

declare @mthDate datetime
set @mthDate = (select max(datadate) from t)
declare @wkDate datetime
set @wkDate = (select max(datadate) from z)

Select
...
from
z
left join a on a.idA = z.idA and a.idB = z.idB
and a.datadate = z.datadate
left join b on b.idA = z.idA and b.idB = z.idB
and b.datadate = @mthDate
left join c on c.idA = z.idA and c.idB = z.idB
and c.datadate = @mthDate
left join d on d.idA = z.idA and d.idB = z.idB
and d.datadate = z.datadate
left join e on e.idA = z.idA
and e.datadate = @mthDate
left join f on f.idA = e.idA and f.datadate=e.date2
left join g on g.idA = e.idA and g.datadate=e.date2
left join h on h.idA = z.idA
left join k on k.ticker = z.ticker
left join m on m.idA = z.idA and m.idB=z.idB
where
z.datadate = @wkDate
<..some other expression filters...>
and k.ticker is null

----END INITIAL-----------

As you can see 'z' is the main table that things are linked to via
outer joins (our security master). Table 'k' has a list of securities
that we wish not to have results for.

There are 77 entries in table k and 4933 in table z for that given
time. We'd expect 4856 to be in this, but no. it's 4400, and then the
next time you run it (no changes whatsover) it's 2312, and so on.
Every time you execute you get a different record count.

My thought/and fix was to move the (k.ticker) predicate out of the
where clause and get a differenced set from z using NOT EXISTS:
-----AMENDED---------------
from
(z where not exists(select * from k where k.ticker=y.ticker)) y
left join a on a.idA = y.idA and a.idB = y.idB
and a.datadate = y.datadate
left join b on b.idA = y.idA and b.idB = y.idB
and b.datadate = @mthDate
left join c on c.idA = y.idA and c.idB = y.idB
and c.datadate = @mthDate
left join d on d.idA = y.idA and d.idB = y.idB
and d.datadate = y.datadate
left join e on e.idA = y.idA
and e.datadate = @mthDate
left join f on f.idA = e.idA and f.datadate=e.date2
left join g on g.idA = e.idA and g.datadate=e.date2
left join h on h.idA = y.idA
left join k on k.ticker = y.ticker
left join m on m.idA = y.idA and m.idB=y.idB
where
y.datadate = @wkDate
<..some other expression filters...>

------------------------

And this works. It's stable now.

I'm hoping someone here can help me up the wisdom curve by explaining
to me 'why' the recordset kept changing before.

My guess is that the cost-based optimizer was resorting the outer joins
and handing back different sets as a result, but i want to understand,
and thought i'd come to this group for help.

I appreciate your time and look forward to replies.

Greg McIntire

Jul 23 '05 #1
7 1659
Greg,

it sounds like a bug to me. If the data doesn't change, then the query
result shouldn't change.

What version and service pack are you running? Can you create a
(simplified?) script that reproceduces the problem?

Gert-Jan
Greg wrote:

I'm a quantitative securities analyst working with Compustat data
(company fiscal reports and pricing feeds).

My coworker came across a problem that we fixed, but I'd like to
understand 'why' it was happening and just don't get it yet.

Here's the starting query (reduced to simple prefixes):

----INITIAL-----

declare @mthDate datetime
set @mthDate = (select max(datadate) from t)
declare @wkDate datetime
set @wkDate = (select max(datadate) from z)

Select
...
from
z
left join a on a.idA = z.idA and a.idB = z.idB
and a.datadate = z.datadate
left join b on b.idA = z.idA and b.idB = z.idB
and b.datadate = @mthDate
left join c on c.idA = z.idA and c.idB = z.idB
and c.datadate = @mthDate
left join d on d.idA = z.idA and d.idB = z.idB
and d.datadate = z.datadate
left join e on e.idA = z.idA
and e.datadate = @mthDate
left join f on f.idA = e.idA and f.datadate=e.date2
left join g on g.idA = e.idA and g.datadate=e.date2
left join h on h.idA = z.idA
left join k on k.ticker = z.ticker
left join m on m.idA = z.idA and m.idB=z.idB
where
z.datadate = @wkDate
<..some other expression filters...>
and k.ticker is null

----END INITIAL-----------

As you can see 'z' is the main table that things are linked to via
outer joins (our security master). Table 'k' has a list of securities
that we wish not to have results for.

There are 77 entries in table k and 4933 in table z for that given
time. We'd expect 4856 to be in this, but no. it's 4400, and then the
next time you run it (no changes whatsover) it's 2312, and so on.
Every time you execute you get a different record count.

My thought/and fix was to move the (k.ticker) predicate out of the
where clause and get a differenced set from z using NOT EXISTS:

-----AMENDED---------------
from
(z where not exists(select * from k where k.ticker=y.ticker)) y
left join a on a.idA = y.idA and a.idB = y.idB
and a.datadate = y.datadate
left join b on b.idA = y.idA and b.idB = y.idB
and b.datadate = @mthDate
left join c on c.idA = y.idA and c.idB = y.idB
and c.datadate = @mthDate
left join d on d.idA = y.idA and d.idB = y.idB
and d.datadate = y.datadate
left join e on e.idA = y.idA
and e.datadate = @mthDate
left join f on f.idA = e.idA and f.datadate=e.date2
left join g on g.idA = e.idA and g.datadate=e.date2
left join h on h.idA = y.idA
left join k on k.ticker = y.ticker
left join m on m.idA = y.idA and m.idB=y.idB
where
y.datadate = @wkDate
<..some other expression filters...>

------------------------

And this works. It's stable now.

I'm hoping someone here can help me up the wisdom curve by explaining
to me 'why' the recordset kept changing before.

My guess is that the cost-based optimizer was resorting the outer joins
and handing back different sets as a result, but i want to understand,
and thought i'd come to this group for help.

I appreciate your time and look forward to replies.

Greg McIntire

Jul 23 '05 #2
It's Version 8.00.194

While my ego would like to say it's not me, it's Microsoft, I'm pretty
doubtful. I think it's much more likely that it's a mental mistake....

Greg

Jul 23 '05 #3
Well, could be, but I would install a service pack anyway. 8.00.194
means you have not installed any SQL-Server service pack, which
basically means you are missing all bug fixes of the last 5 years...

Gert-Jan
Greg wrote:

It's Version 8.00.194

While my ego would like to say it's not me, it's Microsoft, I'm pretty
doubtful. I think it's much more likely that it's a mental mistake....

Greg

Jul 23 '05 #4
alright will talk to our tech guy. thanks a bunch.

Greg

Jul 23 '05 #5
Greg (ja******@hotmail.com) writes:
Select
...
from
z
left join a on a.idA = z.idA and a.idB = z.idB
and a.datadate = z.datadate
left join b on b.idA = z.idA and b.idB = z.idB
and b.datadate = @mthDate
left join c on c.idA = z.idA and c.idB = z.idB
and c.datadate = @mthDate
left join d on d.idA = z.idA and d.idB = z.idB
and d.datadate = z.datadate
left join e on e.idA = z.idA
and e.datadate = @mthDate
left join f on f.idA = e.idA and f.datadate=e.date2
left join g on g.idA = e.idA and g.datadate=e.date2
left join h on h.idA = z.idA
left join k on k.ticker = z.ticker
left join m on m.idA = z.idA and m.idB=z.idB
where
z.datadate = @wkDate
<..some other expression filters...>
and k.ticker is null

The part with e, f and g looks suspicious to me. Not if I can tell
whether they are the cause of your SELECT:s returning a different
number. But you might want to have said:

left join (e
join f on f.idA = e.idA and f.datadate=e.date2
join g on g.idA = e.idA and g.datadate=e.date2)
on e.idA = z.idA
and e.datadate = @mthDate
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 23 '05 #6

"Greg" <ja******@hotmail.com> wrote in message
news:11**********************@l41g2000cwc.googlegr oups.com...
I'm a quantitative securities analyst working with Compustat data
(company fiscal reports and pricing feeds).

My coworker came across a problem that we fixed, but I'd like to
understand 'why' it was happening and just don't get it yet.

Here's the starting query (reduced to simple prefixes):
----INITIAL-----

declare @mthDate datetime
set @mthDate = (select max(datadate) from t)
declare @wkDate datetime
set @wkDate = (select max(datadate) from z)

Select
...
from
z
left join a on a.idA = z.idA and a.idB = z.idB
and a.datadate = z.datadate
left join b on b.idA = z.idA and b.idB = z.idB
and b.datadate = @mthDate
left join c on c.idA = z.idA and c.idB = z.idB
and c.datadate = @mthDate
left join d on d.idA = z.idA and d.idB = z.idB
and d.datadate = z.datadate
left join e on e.idA = z.idA
and e.datadate = @mthDate
left join f on f.idA = e.idA and f.datadate=e.date2
left join g on g.idA = e.idA and g.datadate=e.date2
left join h on h.idA = z.idA
left join k on k.ticker = z.ticker
left join m on m.idA = z.idA and m.idB=z.idB
where
z.datadate = @wkDate
<..some other expression filters...>
and k.ticker is null

----END INITIAL-----------

As you can see 'z' is the main table that things are linked to via
outer joins (our security master). Table 'k' has a list of securities
that we wish not to have results for.

There are 77 entries in table k and 4933 in table z for that given
time. We'd expect 4856 to be in this, but no. it's 4400, and then the
next time you run it (no changes whatsover) it's 2312, and so on.
Every time you execute you get a different record count.

My thought/and fix was to move the (k.ticker) predicate out of the
where clause and get a differenced set from z using NOT EXISTS:
-----AMENDED---------------
from
(z where not exists(select * from k where k.ticker=y.ticker)) y
left join a on a.idA = y.idA and a.idB = y.idB
and a.datadate = y.datadate
left join b on b.idA = y.idA and b.idB = y.idB
and b.datadate = @mthDate
left join c on c.idA = y.idA and c.idB = y.idB
and c.datadate = @mthDate
left join d on d.idA = y.idA and d.idB = y.idB
and d.datadate = y.datadate
left join e on e.idA = y.idA
and e.datadate = @mthDate
left join f on f.idA = e.idA and f.datadate=e.date2
left join g on g.idA = e.idA and g.datadate=e.date2
left join h on h.idA = y.idA
left join k on k.ticker = y.ticker
left join m on m.idA = y.idA and m.idB=y.idB
where
y.datadate = @wkDate
<..some other expression filters...>

------------------------

And this works. It's stable now.

I'm hoping someone here can help me up the wisdom curve by explaining
to me 'why' the recordset kept changing before.

My guess is that the cost-based optimizer was resorting the outer joins
and handing back different sets as a result, but i want to understand,
and thought i'd come to this group for help.

I appreciate your time and look forward to replies.

Greg McIntire


Your guess sounds pretty reasonable to me.

If you use the show query plan setting in Query Analyzer, you can look and
see if the plan selected changes from execution to execution.

As a general rule ... I would take the variable comparison filtering out of
the joining and into the where clause.

Jul 23 '05 #7
> As a general rule ... I would take the variable comparison filtering out of
the joining and into the where clause.


You can't do that with outer joins, because that changes the meaning
(i.e. the resultset).

Gert-Jan
Jul 23 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Martin | last post by:
I am now working on SQL Server 2000 having had previous experience on a different database. Both of the OUTER JOIN syntaxes is different from what I am used to and I am finding it slightly...
4
by: Omavlana | last post by:
Hi, I need your help to resolve this problem. I have written a right outer join query between 2 indipendent tables as follows. select b.Account_desc, b.Account, a.CSPL_CSPL from...
0
by: Preston Landers | last post by:
Hello all. I am trying to write a query that "just" switches some data around so it is shown in a slightly different format. I am already able to do what I want in Oracle 8i, but I am having...
4
by: thilbert | last post by:
All, I have a perplexing problem that I hope someone can help me with. I have the following table struct: Permission ----------------- PermissionId Permission
3
by: Dam | last post by:
Using SqlServer : Query 1 : SELECT def.lID as IdDefinition, TDC_AUneValeur.VALEURDERETOUR as ValeurDeRetour FROM serveur.Data_tblDEFINITIONTABLEDECODES def,...
7
by: Steve | last post by:
I have a SQL query I'm invoking via VB6 & ADO 2.8, that requires three "Left Outer Joins" in order to return every transaction for a specific set of criteria. Using three "Left Outer Joins"...
3
by: Martin | last post by:
Hello everybody, I have the following question. As a join clause on Oracle we use " table1.field1 = table2.field1 (+) " On SQL Server we use " table1.field1 *= table2.field1 " Does DB2...
5
by: kumar_rangan1976 | last post by:
I need the below sybase code to be migrated in UDB : select distinct c.partnumber as I_PART, case when d.IntegratorID = 'DCX05' then 'U' when d.IntegratorID = 'DCX04' then 'M'...
9
by: shanevanle | last post by:
I have two tables that are pretty big. I need about 10 rows in the left table and the right table is filtered to 5 rows as well. It seems when I join the tables in the FROM clause, I have to...
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.