473,786 Members | 2,737 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Outer Join - shifting result set.

I'm a quantitative securities analyst working with Compustat data
(company fiscal reports and pricing feeds).

My coworker came across a problem that we fixed, but I'd like to
understand 'why' it was happening and just don't get it yet.

Here's the starting query (reduced to simple prefixes):
----INITIAL-----

declare @mthDate datetime
set @mthDate = (select max(datadate) from t)
declare @wkDate datetime
set @wkDate = (select max(datadate) from z)

Select
...
from
z
left join a on a.idA = z.idA and a.idB = z.idB
and a.datadate = z.datadate
left join b on b.idA = z.idA and b.idB = z.idB
and b.datadate = @mthDate
left join c on c.idA = z.idA and c.idB = z.idB
and c.datadate = @mthDate
left join d on d.idA = z.idA and d.idB = z.idB
and d.datadate = z.datadate
left join e on e.idA = z.idA
and e.datadate = @mthDate
left join f on f.idA = e.idA and f.datadate=e.da te2
left join g on g.idA = e.idA and g.datadate=e.da te2
left join h on h.idA = z.idA
left join k on k.ticker = z.ticker
left join m on m.idA = z.idA and m.idB=z.idB
where
z.datadate = @wkDate
<..some other expression filters...>
and k.ticker is null

----END INITIAL-----------

As you can see 'z' is the main table that things are linked to via
outer joins (our security master). Table 'k' has a list of securities
that we wish not to have results for.

There are 77 entries in table k and 4933 in table z for that given
time. We'd expect 4856 to be in this, but no. it's 4400, and then the
next time you run it (no changes whatsover) it's 2312, and so on.
Every time you execute you get a different record count.

My thought/and fix was to move the (k.ticker) predicate out of the
where clause and get a differenced set from z using NOT EXISTS:
-----AMENDED---------------
from
(z where not exists(select * from k where k.ticker=y.tick er)) y
left join a on a.idA = y.idA and a.idB = y.idB
and a.datadate = y.datadate
left join b on b.idA = y.idA and b.idB = y.idB
and b.datadate = @mthDate
left join c on c.idA = y.idA and c.idB = y.idB
and c.datadate = @mthDate
left join d on d.idA = y.idA and d.idB = y.idB
and d.datadate = y.datadate
left join e on e.idA = y.idA
and e.datadate = @mthDate
left join f on f.idA = e.idA and f.datadate=e.da te2
left join g on g.idA = e.idA and g.datadate=e.da te2
left join h on h.idA = y.idA
left join k on k.ticker = y.ticker
left join m on m.idA = y.idA and m.idB=y.idB
where
y.datadate = @wkDate
<..some other expression filters...>

------------------------

And this works. It's stable now.

I'm hoping someone here can help me up the wisdom curve by explaining
to me 'why' the recordset kept changing before.

My guess is that the cost-based optimizer was resorting the outer joins
and handing back different sets as a result, but i want to understand,
and thought i'd come to this group for help.

I appreciate your time and look forward to replies.

Greg McIntire

Jul 23 '05 #1
7 1685
Greg,

it sounds like a bug to me. If the data doesn't change, then the query
result shouldn't change.

What version and service pack are you running? Can you create a
(simplified?) script that reproceduces the problem?

Gert-Jan
Greg wrote:

I'm a quantitative securities analyst working with Compustat data
(company fiscal reports and pricing feeds).

My coworker came across a problem that we fixed, but I'd like to
understand 'why' it was happening and just don't get it yet.

Here's the starting query (reduced to simple prefixes):

----INITIAL-----

declare @mthDate datetime
set @mthDate = (select max(datadate) from t)
declare @wkDate datetime
set @wkDate = (select max(datadate) from z)

Select
...
from
z
left join a on a.idA = z.idA and a.idB = z.idB
and a.datadate = z.datadate
left join b on b.idA = z.idA and b.idB = z.idB
and b.datadate = @mthDate
left join c on c.idA = z.idA and c.idB = z.idB
and c.datadate = @mthDate
left join d on d.idA = z.idA and d.idB = z.idB
and d.datadate = z.datadate
left join e on e.idA = z.idA
and e.datadate = @mthDate
left join f on f.idA = e.idA and f.datadate=e.da te2
left join g on g.idA = e.idA and g.datadate=e.da te2
left join h on h.idA = z.idA
left join k on k.ticker = z.ticker
left join m on m.idA = z.idA and m.idB=z.idB
where
z.datadate = @wkDate
<..some other expression filters...>
and k.ticker is null

----END INITIAL-----------

As you can see 'z' is the main table that things are linked to via
outer joins (our security master). Table 'k' has a list of securities
that we wish not to have results for.

There are 77 entries in table k and 4933 in table z for that given
time. We'd expect 4856 to be in this, but no. it's 4400, and then the
next time you run it (no changes whatsover) it's 2312, and so on.
Every time you execute you get a different record count.

My thought/and fix was to move the (k.ticker) predicate out of the
where clause and get a differenced set from z using NOT EXISTS:

-----AMENDED---------------
from
(z where not exists(select * from k where k.ticker=y.tick er)) y
left join a on a.idA = y.idA and a.idB = y.idB
and a.datadate = y.datadate
left join b on b.idA = y.idA and b.idB = y.idB
and b.datadate = @mthDate
left join c on c.idA = y.idA and c.idB = y.idB
and c.datadate = @mthDate
left join d on d.idA = y.idA and d.idB = y.idB
and d.datadate = y.datadate
left join e on e.idA = y.idA
and e.datadate = @mthDate
left join f on f.idA = e.idA and f.datadate=e.da te2
left join g on g.idA = e.idA and g.datadate=e.da te2
left join h on h.idA = y.idA
left join k on k.ticker = y.ticker
left join m on m.idA = y.idA and m.idB=y.idB
where
y.datadate = @wkDate
<..some other expression filters...>

------------------------

And this works. It's stable now.

I'm hoping someone here can help me up the wisdom curve by explaining
to me 'why' the recordset kept changing before.

My guess is that the cost-based optimizer was resorting the outer joins
and handing back different sets as a result, but i want to understand,
and thought i'd come to this group for help.

I appreciate your time and look forward to replies.

Greg McIntire

Jul 23 '05 #2
It's Version 8.00.194

While my ego would like to say it's not me, it's Microsoft, I'm pretty
doubtful. I think it's much more likely that it's a mental mistake....

Greg

Jul 23 '05 #3
Well, could be, but I would install a service pack anyway. 8.00.194
means you have not installed any SQL-Server service pack, which
basically means you are missing all bug fixes of the last 5 years...

Gert-Jan
Greg wrote:

It's Version 8.00.194

While my ego would like to say it's not me, it's Microsoft, I'm pretty
doubtful. I think it's much more likely that it's a mental mistake....

Greg

Jul 23 '05 #4
alright will talk to our tech guy. thanks a bunch.

Greg

Jul 23 '05 #5
Greg (ja******@hotma il.com) writes:
Select
...
from
z
left join a on a.idA = z.idA and a.idB = z.idB
and a.datadate = z.datadate
left join b on b.idA = z.idA and b.idB = z.idB
and b.datadate = @mthDate
left join c on c.idA = z.idA and c.idB = z.idB
and c.datadate = @mthDate
left join d on d.idA = z.idA and d.idB = z.idB
and d.datadate = z.datadate
left join e on e.idA = z.idA
and e.datadate = @mthDate
left join f on f.idA = e.idA and f.datadate=e.da te2
left join g on g.idA = e.idA and g.datadate=e.da te2
left join h on h.idA = z.idA
left join k on k.ticker = z.ticker
left join m on m.idA = z.idA and m.idB=z.idB
where
z.datadate = @wkDate
<..some other expression filters...>
and k.ticker is null

The part with e, f and g looks suspicious to me. Not if I can tell
whether they are the cause of your SELECT:s returning a different
number. But you might want to have said:

left join (e
join f on f.idA = e.idA and f.datadate=e.da te2
join g on g.idA = e.idA and g.datadate=e.da te2)
on e.idA = z.idA
and e.datadate = @mthDate
--
Erland Sommarskog, SQL Server MVP, es****@sommarsk og.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 23 '05 #6

"Greg" <ja******@hotma il.com> wrote in message
news:11******** **************@ l41g2000cwc.goo glegroups.com.. .
I'm a quantitative securities analyst working with Compustat data
(company fiscal reports and pricing feeds).

My coworker came across a problem that we fixed, but I'd like to
understand 'why' it was happening and just don't get it yet.

Here's the starting query (reduced to simple prefixes):
----INITIAL-----

declare @mthDate datetime
set @mthDate = (select max(datadate) from t)
declare @wkDate datetime
set @wkDate = (select max(datadate) from z)

Select
...
from
z
left join a on a.idA = z.idA and a.idB = z.idB
and a.datadate = z.datadate
left join b on b.idA = z.idA and b.idB = z.idB
and b.datadate = @mthDate
left join c on c.idA = z.idA and c.idB = z.idB
and c.datadate = @mthDate
left join d on d.idA = z.idA and d.idB = z.idB
and d.datadate = z.datadate
left join e on e.idA = z.idA
and e.datadate = @mthDate
left join f on f.idA = e.idA and f.datadate=e.da te2
left join g on g.idA = e.idA and g.datadate=e.da te2
left join h on h.idA = z.idA
left join k on k.ticker = z.ticker
left join m on m.idA = z.idA and m.idB=z.idB
where
z.datadate = @wkDate
<..some other expression filters...>
and k.ticker is null

----END INITIAL-----------

As you can see 'z' is the main table that things are linked to via
outer joins (our security master). Table 'k' has a list of securities
that we wish not to have results for.

There are 77 entries in table k and 4933 in table z for that given
time. We'd expect 4856 to be in this, but no. it's 4400, and then the
next time you run it (no changes whatsover) it's 2312, and so on.
Every time you execute you get a different record count.

My thought/and fix was to move the (k.ticker) predicate out of the
where clause and get a differenced set from z using NOT EXISTS:
-----AMENDED---------------
from
(z where not exists(select * from k where k.ticker=y.tick er)) y
left join a on a.idA = y.idA and a.idB = y.idB
and a.datadate = y.datadate
left join b on b.idA = y.idA and b.idB = y.idB
and b.datadate = @mthDate
left join c on c.idA = y.idA and c.idB = y.idB
and c.datadate = @mthDate
left join d on d.idA = y.idA and d.idB = y.idB
and d.datadate = y.datadate
left join e on e.idA = y.idA
and e.datadate = @mthDate
left join f on f.idA = e.idA and f.datadate=e.da te2
left join g on g.idA = e.idA and g.datadate=e.da te2
left join h on h.idA = y.idA
left join k on k.ticker = y.ticker
left join m on m.idA = y.idA and m.idB=y.idB
where
y.datadate = @wkDate
<..some other expression filters...>

------------------------

And this works. It's stable now.

I'm hoping someone here can help me up the wisdom curve by explaining
to me 'why' the recordset kept changing before.

My guess is that the cost-based optimizer was resorting the outer joins
and handing back different sets as a result, but i want to understand,
and thought i'd come to this group for help.

I appreciate your time and look forward to replies.

Greg McIntire


Your guess sounds pretty reasonable to me.

If you use the show query plan setting in Query Analyzer, you can look and
see if the plan selected changes from execution to execution.

As a general rule ... I would take the variable comparison filtering out of
the joining and into the where clause.

Jul 23 '05 #7
> As a general rule ... I would take the variable comparison filtering out of
the joining and into the where clause.


You can't do that with outer joins, because that changes the meaning
(i.e. the resultset).

Gert-Jan
Jul 23 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
18113
by: Martin | last post by:
I am now working on SQL Server 2000 having had previous experience on a different database. Both of the OUTER JOIN syntaxes is different from what I am used to and I am finding it slightly confusing. For example, given two tables : wipm_tbl_mi wipm_tbl_wi (which may not have data in it for a specific record that exists in the first table.)
4
4855
by: Omavlana | last post by:
Hi, I need your help to resolve this problem. I have written a right outer join query between 2 indipendent tables as follows. select b.Account_desc, b.Account, a.CSPL_CSPL from Actual_data_final a right outer join Actual_account_Tbl b on a.account_desc = b.account_desc where (a.source_type = 'TY02' or a.source_type is null) and (a.month = '2ND HALF' or a.month is null) and (a.year = 2004 or a.year is null) and (a.product = 'NP' or...
0
482
by: Preston Landers | last post by:
Hello all. I am trying to write a query that "just" switches some data around so it is shown in a slightly different format. I am already able to do what I want in Oracle 8i, but I am having trouble making it work in SQL Server 2000. I am not a database newbie, but I can't seem to figure this one out so I am turning to the newsgroup. I am thinking that some of the SQL Gurus out there have done this very thing a thousand times before...
4
4871
by: thilbert | last post by:
All, I have a perplexing problem that I hope someone can help me with. I have the following table struct: Permission ----------------- PermissionId Permission
3
10054
by: Dam | last post by:
Using SqlServer : Query 1 : SELECT def.lID as IdDefinition, TDC_AUneValeur.VALEURDERETOUR as ValeurDeRetour FROM serveur.Data_tblDEFINITIONTABLEDECODES def, serveur.Data_tblTABLEDECODEAUNEVALEUR TDC_AUneValeur where def.TYPEDETABLEDECODES = 4
7
31566
by: Steve | last post by:
I have a SQL query I'm invoking via VB6 & ADO 2.8, that requires three "Left Outer Joins" in order to return every transaction for a specific set of criteria. Using three "Left Outer Joins" slows the system down considerably. I've tried creating a temp db, but I can't figure out how to execute two select commands. (It throws the exception "The column prefix 'tempdb' does not match with a table name or alias name used in the query.")
3
19477
by: Martin | last post by:
Hello everybody, I have the following question. As a join clause on Oracle we use " table1.field1 = table2.field1 (+) " On SQL Server we use " table1.field1 *= table2.field1 " Does DB2 have the same type of operator, without using the OUTER JOIN syntax ?
5
3616
by: kumar_rangan1976 | last post by:
I need the below sybase code to be migrated in UDB : select distinct c.partnumber as I_PART, case when d.IntegratorID = 'DCX05' then 'U' when d.IntegratorID = 'DCX04' then 'M' when d.IntegratorID = 'DCX03' then 'E' else ' ' end as 'C_LU_SRCE_PCHSNG', o.SupplierCode as I_SUPLR_LOC,
9
9228
by: shanevanle | last post by:
I have two tables that are pretty big. I need about 10 rows in the left table and the right table is filtered to 5 rows as well. It seems when I join the tables in the FROM clause, I have to filter my left table in the WHERE clause and cannot filter it in the FROM clause. This seems like it would cause a lot of overhead especially when my left table is ten thousand rows. Am I wrong in thinking that the two tables get joined with the...
0
10363
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10164
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10110
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8989
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7512
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6745
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5397
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
4066
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3669
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.