I'm a quantitative securities analyst working with Compustat data
(company fiscal reports and pricing feeds).
My coworker came across a problem that we fixed, but I'd like to
understand 'why' it was happening and just don't get it yet.
Here's the starting query (reduced to simple prefixes):
----INITIAL-----
declare @mthDate datetime
set @mthDate = (select max(datadate) from t)
declare @wkDate datetime
set @wkDate = (select max(datadate) from z)
Select
...
from
z
left join a on a.idA = z.idA and a.idB = z.idB
and a.datadate = z.datadate
left join b on b.idA = z.idA and b.idB = z.idB
and b.datadate = @mthDate
left join c on c.idA = z.idA and c.idB = z.idB
and c.datadate = @mthDate
left join d on d.idA = z.idA and d.idB = z.idB
and d.datadate = z.datadate
left join e on e.idA = z.idA
and e.datadate = @mthDate
left join f on f.idA = e.idA and f.datadate=e.da te2
left join g on g.idA = e.idA and g.datadate=e.da te2
left join h on h.idA = z.idA
left join k on k.ticker = z.ticker
left join m on m.idA = z.idA and m.idB=z.idB
where
z.datadate = @wkDate
<..some other expression filters...>
and k.ticker is null
----END INITIAL-----------
As you can see 'z' is the main table that things are linked to via
outer joins (our security master). Table 'k' has a list of securities
that we wish not to have results for.
There are 77 entries in table k and 4933 in table z for that given
time. We'd expect 4856 to be in this, but no. it's 4400, and then the
next time you run it (no changes whatsover) it's 2312, and so on.
Every time you execute you get a different record count.
My thought/and fix was to move the (k.ticker) predicate out of the
where clause and get a differenced set from z using NOT EXISTS:
-----AMENDED---------------
from
(z where not exists(select * from k where k.ticker=y.tick er)) y
left join a on a.idA = y.idA and a.idB = y.idB
and a.datadate = y.datadate
left join b on b.idA = y.idA and b.idB = y.idB
and b.datadate = @mthDate
left join c on c.idA = y.idA and c.idB = y.idB
and c.datadate = @mthDate
left join d on d.idA = y.idA and d.idB = y.idB
and d.datadate = y.datadate
left join e on e.idA = y.idA
and e.datadate = @mthDate
left join f on f.idA = e.idA and f.datadate=e.da te2
left join g on g.idA = e.idA and g.datadate=e.da te2
left join h on h.idA = y.idA
left join k on k.ticker = y.ticker
left join m on m.idA = y.idA and m.idB=y.idB
where
y.datadate = @wkDate
<..some other expression filters...>
------------------------
And this works. It's stable now.
I'm hoping someone here can help me up the wisdom curve by explaining
to me 'why' the recordset kept changing before.
My guess is that the cost-based optimizer was resorting the outer joins
and handing back different sets as a result, but i want to understand,
and thought i'd come to this group for help.
I appreciate your time and look forward to replies.
Greg McIntire 7 1685
Greg,
it sounds like a bug to me. If the data doesn't change, then the query
result shouldn't change.
What version and service pack are you running? Can you create a
(simplified?) script that reproceduces the problem?
Gert-Jan
Greg wrote: I'm a quantitative securities analyst working with Compustat data (company fiscal reports and pricing feeds).
My coworker came across a problem that we fixed, but I'd like to understand 'why' it was happening and just don't get it yet.
Here's the starting query (reduced to simple prefixes):
----INITIAL-----
declare @mthDate datetime set @mthDate = (select max(datadate) from t) declare @wkDate datetime set @wkDate = (select max(datadate) from z)
Select ... from z left join a on a.idA = z.idA and a.idB = z.idB and a.datadate = z.datadate left join b on b.idA = z.idA and b.idB = z.idB and b.datadate = @mthDate left join c on c.idA = z.idA and c.idB = z.idB and c.datadate = @mthDate left join d on d.idA = z.idA and d.idB = z.idB and d.datadate = z.datadate left join e on e.idA = z.idA and e.datadate = @mthDate left join f on f.idA = e.idA and f.datadate=e.da te2 left join g on g.idA = e.idA and g.datadate=e.da te2 left join h on h.idA = z.idA left join k on k.ticker = z.ticker left join m on m.idA = z.idA and m.idB=z.idB where z.datadate = @wkDate <..some other expression filters...> and k.ticker is null
----END INITIAL-----------
As you can see 'z' is the main table that things are linked to via outer joins (our security master). Table 'k' has a list of securities that we wish not to have results for.
There are 77 entries in table k and 4933 in table z for that given time. We'd expect 4856 to be in this, but no. it's 4400, and then the next time you run it (no changes whatsover) it's 2312, and so on. Every time you execute you get a different record count.
My thought/and fix was to move the (k.ticker) predicate out of the where clause and get a differenced set from z using NOT EXISTS:
-----AMENDED--------------- from (z where not exists(select * from k where k.ticker=y.tick er)) y left join a on a.idA = y.idA and a.idB = y.idB and a.datadate = y.datadate left join b on b.idA = y.idA and b.idB = y.idB and b.datadate = @mthDate left join c on c.idA = y.idA and c.idB = y.idB and c.datadate = @mthDate left join d on d.idA = y.idA and d.idB = y.idB and d.datadate = y.datadate left join e on e.idA = y.idA and e.datadate = @mthDate left join f on f.idA = e.idA and f.datadate=e.da te2 left join g on g.idA = e.idA and g.datadate=e.da te2 left join h on h.idA = y.idA left join k on k.ticker = y.ticker left join m on m.idA = y.idA and m.idB=y.idB where y.datadate = @wkDate <..some other expression filters...>
------------------------
And this works. It's stable now.
I'm hoping someone here can help me up the wisdom curve by explaining to me 'why' the recordset kept changing before.
My guess is that the cost-based optimizer was resorting the outer joins and handing back different sets as a result, but i want to understand, and thought i'd come to this group for help.
I appreciate your time and look forward to replies.
Greg McIntire
It's Version 8.00.194
While my ego would like to say it's not me, it's Microsoft, I'm pretty
doubtful. I think it's much more likely that it's a mental mistake....
Greg
Well, could be, but I would install a service pack anyway. 8.00.194
means you have not installed any SQL-Server service pack, which
basically means you are missing all bug fixes of the last 5 years...
Gert-Jan
Greg wrote: It's Version 8.00.194
While my ego would like to say it's not me, it's Microsoft, I'm pretty doubtful. I think it's much more likely that it's a mental mistake....
Greg
alright will talk to our tech guy. thanks a bunch.
Greg
Greg (ja******@hotma il.com) writes: Select ... from z left join a on a.idA = z.idA and a.idB = z.idB and a.datadate = z.datadate left join b on b.idA = z.idA and b.idB = z.idB and b.datadate = @mthDate left join c on c.idA = z.idA and c.idB = z.idB and c.datadate = @mthDate left join d on d.idA = z.idA and d.idB = z.idB and d.datadate = z.datadate left join e on e.idA = z.idA and e.datadate = @mthDate left join f on f.idA = e.idA and f.datadate=e.da te2 left join g on g.idA = e.idA and g.datadate=e.da te2 left join h on h.idA = z.idA left join k on k.ticker = z.ticker left join m on m.idA = z.idA and m.idB=z.idB where z.datadate = @wkDate <..some other expression filters...> and k.ticker is null
The part with e, f and g looks suspicious to me. Not if I can tell
whether they are the cause of your SELECT:s returning a different
number. But you might want to have said:
left join (e
join f on f.idA = e.idA and f.datadate=e.da te2
join g on g.idA = e.idA and g.datadate=e.da te2)
on e.idA = z.idA
and e.datadate = @mthDate
--
Erland Sommarskog, SQL Server MVP, es****@sommarsk og.se
Books Online for SQL Server SP3 at http://www.microsoft.com/sql/techinf...2000/books.asp
"Greg" <ja******@hotma il.com> wrote in message
news:11******** **************@ l41g2000cwc.goo glegroups.com.. . I'm a quantitative securities analyst working with Compustat data (company fiscal reports and pricing feeds).
My coworker came across a problem that we fixed, but I'd like to understand 'why' it was happening and just don't get it yet.
Here's the starting query (reduced to simple prefixes):
----INITIAL-----
declare @mthDate datetime set @mthDate = (select max(datadate) from t) declare @wkDate datetime set @wkDate = (select max(datadate) from z)
Select ... from z left join a on a.idA = z.idA and a.idB = z.idB and a.datadate = z.datadate left join b on b.idA = z.idA and b.idB = z.idB and b.datadate = @mthDate left join c on c.idA = z.idA and c.idB = z.idB and c.datadate = @mthDate left join d on d.idA = z.idA and d.idB = z.idB and d.datadate = z.datadate left join e on e.idA = z.idA and e.datadate = @mthDate left join f on f.idA = e.idA and f.datadate=e.da te2 left join g on g.idA = e.idA and g.datadate=e.da te2 left join h on h.idA = z.idA left join k on k.ticker = z.ticker left join m on m.idA = z.idA and m.idB=z.idB where z.datadate = @wkDate <..some other expression filters...> and k.ticker is null
----END INITIAL-----------
As you can see 'z' is the main table that things are linked to via outer joins (our security master). Table 'k' has a list of securities that we wish not to have results for.
There are 77 entries in table k and 4933 in table z for that given time. We'd expect 4856 to be in this, but no. it's 4400, and then the next time you run it (no changes whatsover) it's 2312, and so on. Every time you execute you get a different record count.
My thought/and fix was to move the (k.ticker) predicate out of the where clause and get a differenced set from z using NOT EXISTS:
-----AMENDED--------------- from (z where not exists(select * from k where k.ticker=y.tick er)) y left join a on a.idA = y.idA and a.idB = y.idB and a.datadate = y.datadate left join b on b.idA = y.idA and b.idB = y.idB and b.datadate = @mthDate left join c on c.idA = y.idA and c.idB = y.idB and c.datadate = @mthDate left join d on d.idA = y.idA and d.idB = y.idB and d.datadate = y.datadate left join e on e.idA = y.idA and e.datadate = @mthDate left join f on f.idA = e.idA and f.datadate=e.da te2 left join g on g.idA = e.idA and g.datadate=e.da te2 left join h on h.idA = y.idA left join k on k.ticker = y.ticker left join m on m.idA = y.idA and m.idB=y.idB where y.datadate = @wkDate <..some other expression filters...>
------------------------
And this works. It's stable now.
I'm hoping someone here can help me up the wisdom curve by explaining to me 'why' the recordset kept changing before.
My guess is that the cost-based optimizer was resorting the outer joins and handing back different sets as a result, but i want to understand, and thought i'd come to this group for help.
I appreciate your time and look forward to replies.
Greg McIntire
Your guess sounds pretty reasonable to me.
If you use the show query plan setting in Query Analyzer, you can look and
see if the plan selected changes from execution to execution.
As a general rule ... I would take the variable comparison filtering out of
the joining and into the where clause.
> As a general rule ... I would take the variable comparison filtering out of the joining and into the where clause.
You can't do that with outer joins, because that changes the meaning
(i.e. the resultset).
Gert-Jan This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Martin |
last post by:
I am now working on SQL Server 2000 having had previous experience on
a different database. Both of the OUTER JOIN syntaxes is different
from what I am used to and I am finding it slightly confusing.
For example, given two tables :
wipm_tbl_mi
wipm_tbl_wi (which may not have data in it for a specific record that
exists in the first table.)
|
by: Omavlana |
last post by:
Hi,
I need your help to resolve this problem. I have written a right outer
join query between 2 indipendent tables as follows.
select b.Account_desc, b.Account, a.CSPL_CSPL from Actual_data_final a
right outer join Actual_account_Tbl b on a.account_desc =
b.account_desc where (a.source_type = 'TY02' or a.source_type is
null) and (a.month = '2ND HALF' or a.month is null) and (a.year = 2004
or a.year is null) and (a.product = 'NP' or...
|
by: Preston Landers |
last post by:
Hello all.
I am trying to write a query that "just" switches some data around so
it is shown in a slightly different format. I am already able to do
what I want in Oracle 8i, but I am having trouble making it work in
SQL Server 2000. I am not a database newbie, but I can't seem to
figure this one out so I am turning to the newsgroup. I am thinking
that some of the SQL Gurus out there have done this very thing a
thousand times before...
|
by: thilbert |
last post by:
All,
I have a perplexing problem that I hope someone can help me with.
I have the following table struct:
Permission
-----------------
PermissionId
Permission
|
by: Dam |
last post by:
Using SqlServer :
Query 1 :
SELECT def.lID as IdDefinition,
TDC_AUneValeur.VALEURDERETOUR as ValeurDeRetour
FROM serveur.Data_tblDEFINITIONTABLEDECODES def,
serveur.Data_tblTABLEDECODEAUNEVALEUR TDC_AUneValeur
where def.TYPEDETABLEDECODES = 4
| |
by: Steve |
last post by:
I have a SQL query I'm invoking via VB6 & ADO 2.8, that requires three
"Left Outer Joins" in order to return every transaction for a specific
set of criteria.
Using three "Left Outer Joins" slows the system down considerably.
I've tried creating a temp db, but I can't figure out how to execute
two select commands. (It throws the exception "The column prefix
'tempdb' does not match with a table name or alias name used in the
query.")
|
by: Martin |
last post by:
Hello everybody,
I have the following question.
As a join clause on Oracle we use " table1.field1 = table2.field1 (+) "
On SQL Server we use " table1.field1 *= table2.field1 "
Does DB2 have the same type of operator, without using the OUTER JOIN
syntax ?
|
by: kumar_rangan1976 |
last post by:
I need the below sybase code to be migrated in UDB :
select distinct
c.partnumber as I_PART,
case
when d.IntegratorID = 'DCX05' then 'U'
when d.IntegratorID = 'DCX04' then 'M'
when d.IntegratorID = 'DCX03' then 'E'
else ' ' end as 'C_LU_SRCE_PCHSNG',
o.SupplierCode as I_SUPLR_LOC,
|
by: shanevanle |
last post by:
I have two tables that are pretty big. I need about 10 rows in the
left table and the right table is filtered to 5 rows as well. It
seems when I join the tables in the FROM clause, I have to filter my
left table in the WHERE clause and cannot filter it in the FROM
clause.
This seems like it would cause a lot of overhead especially when my
left table is ten thousand rows. Am I wrong in thinking that the two
tables get joined with the...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
| |
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
| |
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |