I'm a quantitative securities analyst working with Compustat data
(company fiscal reports and pricing feeds).
My coworker came across a problem that we fixed, but I'd like to
understand 'why' it was happening and just don't get it yet.
Here's the starting query (reduced to simple prefixes):
----INITIAL-----
declare @mthDate datetime
set @mthDate = (select max(datadate) from t)
declare @wkDate datetime
set @wkDate = (select max(datadate) from z)
Select
...
from
z
left join a on a.idA = z.idA and a.idB = z.idB
and a.datadate = z.datadate
left join b on b.idA = z.idA and b.idB = z.idB
and b.datadate = @mthDate
left join c on c.idA = z.idA and c.idB = z.idB
and c.datadate = @mthDate
left join d on d.idA = z.idA and d.idB = z.idB
and d.datadate = z.datadate
left join e on e.idA = z.idA
and e.datadate = @mthDate
left join f on f.idA = e.idA and f.datadate=e.date2
left join g on g.idA = e.idA and g.datadate=e.date2
left join h on h.idA = z.idA
left join k on k.ticker = z.ticker
left join m on m.idA = z.idA and m.idB=z.idB
where
z.datadate = @wkDate
<..some other expression filters...>
and k.ticker is null
----END INITIAL-----------
As you can see 'z' is the main table that things are linked to via
outer joins (our security master). Table 'k' has a list of securities
that we wish not to have results for.
There are 77 entries in table k and 4933 in table z for that given
time. We'd expect 4856 to be in this, but no. it's 4400, and then the
next time you run it (no changes whatsover) it's 2312, and so on.
Every time you execute you get a different record count.
My thought/and fix was to move the (k.ticker) predicate out of the
where clause and get a differenced set from z using NOT EXISTS:
-----AMENDED---------------
from
(z where not exists(select * from k where k.ticker=y.ticker)) y
left join a on a.idA = y.idA and a.idB = y.idB
and a.datadate = y.datadate
left join b on b.idA = y.idA and b.idB = y.idB
and b.datadate = @mthDate
left join c on c.idA = y.idA and c.idB = y.idB
and c.datadate = @mthDate
left join d on d.idA = y.idA and d.idB = y.idB
and d.datadate = y.datadate
left join e on e.idA = y.idA
and e.datadate = @mthDate
left join f on f.idA = e.idA and f.datadate=e.date2
left join g on g.idA = e.idA and g.datadate=e.date2
left join h on h.idA = y.idA
left join k on k.ticker = y.ticker
left join m on m.idA = y.idA and m.idB=y.idB
where
y.datadate = @wkDate
<..some other expression filters...>
------------------------
And this works. It's stable now.
I'm hoping someone here can help me up the wisdom curve by explaining
to me 'why' the recordset kept changing before.
My guess is that the cost-based optimizer was resorting the outer joins
and handing back different sets as a result, but i want to understand,
and thought i'd come to this group for help.
I appreciate your time and look forward to replies.
Greg McIntire 7 1659
Greg,
it sounds like a bug to me. If the data doesn't change, then the query
result shouldn't change.
What version and service pack are you running? Can you create a
(simplified?) script that reproceduces the problem?
Gert-Jan
Greg wrote: I'm a quantitative securities analyst working with Compustat data (company fiscal reports and pricing feeds).
My coworker came across a problem that we fixed, but I'd like to understand 'why' it was happening and just don't get it yet.
Here's the starting query (reduced to simple prefixes):
----INITIAL-----
declare @mthDate datetime set @mthDate = (select max(datadate) from t) declare @wkDate datetime set @wkDate = (select max(datadate) from z)
Select ... from z left join a on a.idA = z.idA and a.idB = z.idB and a.datadate = z.datadate left join b on b.idA = z.idA and b.idB = z.idB and b.datadate = @mthDate left join c on c.idA = z.idA and c.idB = z.idB and c.datadate = @mthDate left join d on d.idA = z.idA and d.idB = z.idB and d.datadate = z.datadate left join e on e.idA = z.idA and e.datadate = @mthDate left join f on f.idA = e.idA and f.datadate=e.date2 left join g on g.idA = e.idA and g.datadate=e.date2 left join h on h.idA = z.idA left join k on k.ticker = z.ticker left join m on m.idA = z.idA and m.idB=z.idB where z.datadate = @wkDate <..some other expression filters...> and k.ticker is null
----END INITIAL-----------
As you can see 'z' is the main table that things are linked to via outer joins (our security master). Table 'k' has a list of securities that we wish not to have results for.
There are 77 entries in table k and 4933 in table z for that given time. We'd expect 4856 to be in this, but no. it's 4400, and then the next time you run it (no changes whatsover) it's 2312, and so on. Every time you execute you get a different record count.
My thought/and fix was to move the (k.ticker) predicate out of the where clause and get a differenced set from z using NOT EXISTS:
-----AMENDED--------------- from (z where not exists(select * from k where k.ticker=y.ticker)) y left join a on a.idA = y.idA and a.idB = y.idB and a.datadate = y.datadate left join b on b.idA = y.idA and b.idB = y.idB and b.datadate = @mthDate left join c on c.idA = y.idA and c.idB = y.idB and c.datadate = @mthDate left join d on d.idA = y.idA and d.idB = y.idB and d.datadate = y.datadate left join e on e.idA = y.idA and e.datadate = @mthDate left join f on f.idA = e.idA and f.datadate=e.date2 left join g on g.idA = e.idA and g.datadate=e.date2 left join h on h.idA = y.idA left join k on k.ticker = y.ticker left join m on m.idA = y.idA and m.idB=y.idB where y.datadate = @wkDate <..some other expression filters...>
------------------------
And this works. It's stable now.
I'm hoping someone here can help me up the wisdom curve by explaining to me 'why' the recordset kept changing before.
My guess is that the cost-based optimizer was resorting the outer joins and handing back different sets as a result, but i want to understand, and thought i'd come to this group for help.
I appreciate your time and look forward to replies.
Greg McIntire
It's Version 8.00.194
While my ego would like to say it's not me, it's Microsoft, I'm pretty
doubtful. I think it's much more likely that it's a mental mistake....
Greg
Well, could be, but I would install a service pack anyway. 8.00.194
means you have not installed any SQL-Server service pack, which
basically means you are missing all bug fixes of the last 5 years...
Gert-Jan
Greg wrote: It's Version 8.00.194
While my ego would like to say it's not me, it's Microsoft, I'm pretty doubtful. I think it's much more likely that it's a mental mistake....
Greg
alright will talk to our tech guy. thanks a bunch.
Greg
Greg (ja******@hotmail.com) writes: Select ... from z left join a on a.idA = z.idA and a.idB = z.idB and a.datadate = z.datadate left join b on b.idA = z.idA and b.idB = z.idB and b.datadate = @mthDate left join c on c.idA = z.idA and c.idB = z.idB and c.datadate = @mthDate left join d on d.idA = z.idA and d.idB = z.idB and d.datadate = z.datadate left join e on e.idA = z.idA and e.datadate = @mthDate left join f on f.idA = e.idA and f.datadate=e.date2 left join g on g.idA = e.idA and g.datadate=e.date2 left join h on h.idA = z.idA left join k on k.ticker = z.ticker left join m on m.idA = z.idA and m.idB=z.idB where z.datadate = @wkDate <..some other expression filters...> and k.ticker is null
The part with e, f and g looks suspicious to me. Not if I can tell
whether they are the cause of your SELECT:s returning a different
number. But you might want to have said:
left join (e
join f on f.idA = e.idA and f.datadate=e.date2
join g on g.idA = e.idA and g.datadate=e.date2)
on e.idA = z.idA
and e.datadate = @mthDate
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se
Books Online for SQL Server SP3 at http://www.microsoft.com/sql/techinf...2000/books.asp
"Greg" <ja******@hotmail.com> wrote in message
news:11**********************@l41g2000cwc.googlegr oups.com... I'm a quantitative securities analyst working with Compustat data (company fiscal reports and pricing feeds).
My coworker came across a problem that we fixed, but I'd like to understand 'why' it was happening and just don't get it yet.
Here's the starting query (reduced to simple prefixes):
----INITIAL-----
declare @mthDate datetime set @mthDate = (select max(datadate) from t) declare @wkDate datetime set @wkDate = (select max(datadate) from z)
Select ... from z left join a on a.idA = z.idA and a.idB = z.idB and a.datadate = z.datadate left join b on b.idA = z.idA and b.idB = z.idB and b.datadate = @mthDate left join c on c.idA = z.idA and c.idB = z.idB and c.datadate = @mthDate left join d on d.idA = z.idA and d.idB = z.idB and d.datadate = z.datadate left join e on e.idA = z.idA and e.datadate = @mthDate left join f on f.idA = e.idA and f.datadate=e.date2 left join g on g.idA = e.idA and g.datadate=e.date2 left join h on h.idA = z.idA left join k on k.ticker = z.ticker left join m on m.idA = z.idA and m.idB=z.idB where z.datadate = @wkDate <..some other expression filters...> and k.ticker is null
----END INITIAL-----------
As you can see 'z' is the main table that things are linked to via outer joins (our security master). Table 'k' has a list of securities that we wish not to have results for.
There are 77 entries in table k and 4933 in table z for that given time. We'd expect 4856 to be in this, but no. it's 4400, and then the next time you run it (no changes whatsover) it's 2312, and so on. Every time you execute you get a different record count.
My thought/and fix was to move the (k.ticker) predicate out of the where clause and get a differenced set from z using NOT EXISTS:
-----AMENDED--------------- from (z where not exists(select * from k where k.ticker=y.ticker)) y left join a on a.idA = y.idA and a.idB = y.idB and a.datadate = y.datadate left join b on b.idA = y.idA and b.idB = y.idB and b.datadate = @mthDate left join c on c.idA = y.idA and c.idB = y.idB and c.datadate = @mthDate left join d on d.idA = y.idA and d.idB = y.idB and d.datadate = y.datadate left join e on e.idA = y.idA and e.datadate = @mthDate left join f on f.idA = e.idA and f.datadate=e.date2 left join g on g.idA = e.idA and g.datadate=e.date2 left join h on h.idA = y.idA left join k on k.ticker = y.ticker left join m on m.idA = y.idA and m.idB=y.idB where y.datadate = @wkDate <..some other expression filters...>
------------------------
And this works. It's stable now.
I'm hoping someone here can help me up the wisdom curve by explaining to me 'why' the recordset kept changing before.
My guess is that the cost-based optimizer was resorting the outer joins and handing back different sets as a result, but i want to understand, and thought i'd come to this group for help.
I appreciate your time and look forward to replies.
Greg McIntire
Your guess sounds pretty reasonable to me.
If you use the show query plan setting in Query Analyzer, you can look and
see if the plan selected changes from execution to execution.
As a general rule ... I would take the variable comparison filtering out of
the joining and into the where clause.
> As a general rule ... I would take the variable comparison filtering out of the joining and into the where clause.
You can't do that with outer joins, because that changes the meaning
(i.e. the resultset).
Gert-Jan This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Martin |
last post by:
I am now working on SQL Server 2000 having had previous experience on
a different database. Both of the OUTER JOIN syntaxes is different
from what I am used to and I am finding it slightly...
|
by: Omavlana |
last post by:
Hi,
I need your help to resolve this problem. I have written a right outer
join query between 2 indipendent tables as follows.
select b.Account_desc, b.Account, a.CSPL_CSPL from...
|
by: Preston Landers |
last post by:
Hello all.
I am trying to write a query that "just" switches some data around so
it is shown in a slightly different format. I am already able to do
what I want in Oracle 8i, but I am having...
|
by: thilbert |
last post by:
All,
I have a perplexing problem that I hope someone can help me with.
I have the following table struct:
Permission
-----------------
PermissionId
Permission
|
by: Dam |
last post by:
Using SqlServer :
Query 1 :
SELECT def.lID as IdDefinition,
TDC_AUneValeur.VALEURDERETOUR as ValeurDeRetour
FROM serveur.Data_tblDEFINITIONTABLEDECODES def,...
|
by: Steve |
last post by:
I have a SQL query I'm invoking via VB6 & ADO 2.8, that requires three
"Left Outer Joins" in order to return every transaction for a specific
set of criteria.
Using three "Left Outer Joins"...
|
by: Martin |
last post by:
Hello everybody,
I have the following question.
As a join clause on Oracle we use " table1.field1 = table2.field1 (+) "
On SQL Server we use " table1.field1 *= table2.field1 "
Does DB2...
|
by: kumar_rangan1976 |
last post by:
I need the below sybase code to be migrated in UDB :
select distinct
c.partnumber as I_PART,
case
when d.IntegratorID = 'DCX05' then 'U'
when d.IntegratorID = 'DCX04' then 'M'...
|
by: shanevanle |
last post by:
I have two tables that are pretty big. I need about 10 rows in the
left table and the right table is filtered to 5 rows as well. It
seems when I join the tables in the FROM clause, I have to...
|
by: DolphinDB |
last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation.
Take...
|
by: ryjfgjl |
last post by:
ExcelToDatabase: batch import excel into database automatically...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: Vimpel783 |
last post by:
Hello!
Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
|
by: ArrayDB |
last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
|
by: PapaRatzi |
last post by:
Hello,
I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
|
by: Defcon1945 |
last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
|
by: Shællîpôpï 09 |
last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
|
by: af34tf |
last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
| |