Outer Join - shifting result set. - Microsoft SQL Server

Greg

I'm a quantitative securities analyst working with Compustat data
(company fiscal reports and pricing feeds).

My coworker came across a problem that we fixed, but I'd like to
understand 'why' it was happening and just don't get it yet.

Here's the starting query (reduced to simple prefixes):
----INITIAL-----

declare @mthDate datetime
set @mthDate = (select max(datadate) from t)
declare @wkDate datetime
set @wkDate = (select max(datadate) from z)

Select
...
from
z
left join a on a.idA = z.idA and a.idB = z.idB
and a.datadate = z.datadate
left join b on b.idA = z.idA and b.idB = z.idB
and b.datadate = @mthDate
left join c on c.idA = z.idA and c.idB = z.idB
and c.datadate = @mthDate
left join d on d.idA = z.idA and d.idB = z.idB
and d.datadate = z.datadate
left join e on e.idA = z.idA
and e.datadate = @mthDate
left join f on f.idA = e.idA and f.datadate=e.da te2
left join g on g.idA = e.idA and g.datadate=e.da te2
left join h on h.idA = z.idA
left join k on k.ticker = z.ticker
left join m on m.idA = z.idA and m.idB=z.idB
where
z.datadate = @wkDate
<..some other expression filters...>
and k.ticker is null

----END INITIAL-----------

As you can see 'z' is the main table that things are linked to via
outer joins (our security master). Table 'k' has a list of securities
that we wish not to have results for.

There are 77 entries in table k and 4933 in table z for that given
time. We'd expect 4856 to be in this, but no. it's 4400, and then the
next time you run it (no changes whatsover) it's 2312, and so on.
Every time you execute you get a different record count.

My thought/and fix was to move the (k.ticker) predicate out of the
where clause and get a differenced set from z using NOT EXISTS:
-----AMENDED---------------
from
(z where not exists(select * from k where k.ticker=y.tick er)) y
left join a on a.idA = y.idA and a.idB = y.idB
and a.datadate = y.datadate
left join b on b.idA = y.idA and b.idB = y.idB
and b.datadate = @mthDate
left join c on c.idA = y.idA and c.idB = y.idB
and c.datadate = @mthDate
left join d on d.idA = y.idA and d.idB = y.idB
and d.datadate = y.datadate
left join e on e.idA = y.idA
and e.datadate = @mthDate
left join f on f.idA = e.idA and f.datadate=e.da te2
left join g on g.idA = e.idA and g.datadate=e.da te2
left join h on h.idA = y.idA
left join k on k.ticker = y.ticker
left join m on m.idA = y.idA and m.idB=y.idB
where
y.datadate = @wkDate
<..some other expression filters...>

------------------------

And this works. It's stable now.

I'm hoping someone here can help me up the wisdom curve by explaining
to me 'why' the recordset kept changing before.

My guess is that the cost-based optimizer was resorting the outer joins
and handing back different sets as a result, but i want to understand,
and thought i'd come to this group for help.

I appreciate your time and look forward to replies.

Greg McIntire

Jul 23 '05 #1

Subscribe Reply

1685

Gert-Jan Strik

Greg,

it sounds like a bug to me. If the data doesn't change, then the query
result shouldn't change.

What version and service pack are you running? Can you create a
(simplified?) script that reproceduces the problem?

Gert-Jan
Greg wrote:

I'm a quantitative securities analyst working with Compustat data
(company fiscal reports and pricing feeds).

My coworker came across a problem that we fixed, but I'd like to
understand 'why' it was happening and just don't get it yet.

Here's the starting query (reduced to simple prefixes):

----INITIAL-----

declare @mthDate datetime
set @mthDate = (select max(datadate) from t)
declare @wkDate datetime
set @wkDate = (select max(datadate) from z)

Select
...
from
z
left join a on a.idA = z.idA and a.idB = z.idB
and a.datadate = z.datadate
left join b on b.idA = z.idA and b.idB = z.idB
and b.datadate = @mthDate
left join c on c.idA = z.idA and c.idB = z.idB
and c.datadate = @mthDate
left join d on d.idA = z.idA and d.idB = z.idB
and d.datadate = z.datadate
left join e on e.idA = z.idA
and e.datadate = @mthDate
left join f on f.idA = e.idA and f.datadate=e.da te2
left join g on g.idA = e.idA and g.datadate=e.da te2
left join h on h.idA = z.idA
left join k on k.ticker = z.ticker
left join m on m.idA = z.idA and m.idB=z.idB
where
z.datadate = @wkDate
<..some other expression filters...>
and k.ticker is null

----END INITIAL-----------

As you can see 'z' is the main table that things are linked to via
outer joins (our security master). Table 'k' has a list of securities
that we wish not to have results for.

There are 77 entries in table k and 4933 in table z for that given
time. We'd expect 4856 to be in this, but no. it's 4400, and then the
next time you run it (no changes whatsover) it's 2312, and so on.
Every time you execute you get a different record count.

My thought/and fix was to move the (k.ticker) predicate out of the
where clause and get a differenced set from z using NOT EXISTS:

-----AMENDED---------------
from
(z where not exists(select * from k where k.ticker=y.tick er)) y
left join a on a.idA = y.idA and a.idB = y.idB
and a.datadate = y.datadate
left join b on b.idA = y.idA and b.idB = y.idB
and b.datadate = @mthDate
left join c on c.idA = y.idA and c.idB = y.idB
and c.datadate = @mthDate
left join d on d.idA = y.idA and d.idB = y.idB
and d.datadate = y.datadate
left join e on e.idA = y.idA
and e.datadate = @mthDate
left join f on f.idA = e.idA and f.datadate=e.da te2
left join g on g.idA = e.idA and g.datadate=e.da te2
left join h on h.idA = y.idA
left join k on k.ticker = y.ticker
left join m on m.idA = y.idA and m.idB=y.idB
where
y.datadate = @wkDate
<..some other expression filters...>

------------------------

And this works. It's stable now.

I'm hoping someone here can help me up the wisdom curve by explaining
to me 'why' the recordset kept changing before.

My guess is that the cost-based optimizer was resorting the outer joins
and handing back different sets as a result, but i want to understand,
and thought i'd come to this group for help.

I appreciate your time and look forward to replies.

Greg McIntire

Jul 23 '05 #2

Greg

It's Version 8.00.194

While my ego would like to say it's not me, it's Microsoft, I'm pretty
doubtful. I think it's much more likely that it's a mental mistake....

Greg

Jul 23 '05 #3

Gert-Jan Strik

Well, could be, but I would install a service pack anyway. 8.00.194
means you have not installed any SQL-Server service pack, which
basically means you are missing all bug fixes of the last 5 years...

Gert-Jan
Greg wrote:

It's Version 8.00.194

While my ego would like to say it's not me, it's Microsoft, I'm pretty
doubtful. I think it's much more likely that it's a mental mistake....

Greg

Jul 23 '05 #4

Greg

alright will talk to our tech guy. thanks a bunch.

Greg

Jul 23 '05 #5

Erland Sommarskog

Greg (ja******@hotma il.com) writes:

Select
...
from
z
left join a on a.idA = z.idA and a.idB = z.idB
and a.datadate = z.datadate
left join b on b.idA = z.idA and b.idB = z.idB
and b.datadate = @mthDate
left join c on c.idA = z.idA and c.idB = z.idB
and c.datadate = @mthDate
left join d on d.idA = z.idA and d.idB = z.idB
and d.datadate = z.datadate
left join e on e.idA = z.idA
and e.datadate = @mthDate
left join f on f.idA = e.idA and f.datadate=e.da te2
left join g on g.idA = e.idA and g.datadate=e.da te2
left join h on h.idA = z.idA
left join k on k.ticker = z.ticker
left join m on m.idA = z.idA and m.idB=z.idB
where
z.datadate = @wkDate
<..some other expression filters...>
and k.ticker is null

The part with e, f and g looks suspicious to me. Not if I can tell
whether they are the cause of your SELECT:s returning a different
number. But you might want to have said:

left join (e
join f on f.idA = e.idA and f.datadate=e.da te2
join g on g.idA = e.idA and g.datadate=e.da te2)
on e.idA = z.idA
and e.datadate = @mthDate
--
Erland Sommarskog, SQL Server MVP, es****@sommarsk og.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Jul 23 '05 #6

David Rawheiser

"Greg" <ja******@hotma il.com> wrote in message
news:11******** **************@ l41g2000cwc.goo glegroups.com.. .

I'm a quantitative securities analyst working with Compustat data
(company fiscal reports and pricing feeds).

My coworker came across a problem that we fixed, but I'd like to
understand 'why' it was happening and just don't get it yet.

Here's the starting query (reduced to simple prefixes):
----INITIAL-----

declare @mthDate datetime
set @mthDate = (select max(datadate) from t)
declare @wkDate datetime
set @wkDate = (select max(datadate) from z)

Select
...
from
z
left join a on a.idA = z.idA and a.idB = z.idB
and a.datadate = z.datadate
left join b on b.idA = z.idA and b.idB = z.idB
and b.datadate = @mthDate
left join c on c.idA = z.idA and c.idB = z.idB
and c.datadate = @mthDate
left join d on d.idA = z.idA and d.idB = z.idB
and d.datadate = z.datadate
left join e on e.idA = z.idA
and e.datadate = @mthDate
left join f on f.idA = e.idA and f.datadate=e.da te2
left join g on g.idA = e.idA and g.datadate=e.da te2
left join h on h.idA = z.idA
left join k on k.ticker = z.ticker
left join m on m.idA = z.idA and m.idB=z.idB
where
z.datadate = @wkDate
<..some other expression filters...>
and k.ticker is null

----END INITIAL-----------

As you can see 'z' is the main table that things are linked to via
outer joins (our security master). Table 'k' has a list of securities
that we wish not to have results for.

There are 77 entries in table k and 4933 in table z for that given
time. We'd expect 4856 to be in this, but no. it's 4400, and then the
next time you run it (no changes whatsover) it's 2312, and so on.
Every time you execute you get a different record count.

My thought/and fix was to move the (k.ticker) predicate out of the
where clause and get a differenced set from z using NOT EXISTS:
-----AMENDED---------------
from
(z where not exists(select * from k where k.ticker=y.tick er)) y
left join a on a.idA = y.idA and a.idB = y.idB
and a.datadate = y.datadate
left join b on b.idA = y.idA and b.idB = y.idB
and b.datadate = @mthDate
left join c on c.idA = y.idA and c.idB = y.idB
and c.datadate = @mthDate
left join d on d.idA = y.idA and d.idB = y.idB
and d.datadate = y.datadate
left join e on e.idA = y.idA
and e.datadate = @mthDate
left join f on f.idA = e.idA and f.datadate=e.da te2
left join g on g.idA = e.idA and g.datadate=e.da te2
left join h on h.idA = y.idA
left join k on k.ticker = y.ticker
left join m on m.idA = y.idA and m.idB=y.idB
where
y.datadate = @wkDate
<..some other expression filters...>

------------------------

And this works. It's stable now.

I'm hoping someone here can help me up the wisdom curve by explaining
to me 'why' the recordset kept changing before.

My guess is that the cost-based optimizer was resorting the outer joins
and handing back different sets as a result, but i want to understand,
and thought i'd come to this group for help.

I appreciate your time and look forward to replies.

Greg McIntire

Your guess sounds pretty reasonable to me.

If you use the show query plan setting in Query Analyzer, you can look and
see if the plan selected changes from execution to execution.

As a general rule ... I would take the variable comparison filtering out of
the joining and into the where clause.

Jul 23 '05 #7

Gert-Jan Strik

> As a general rule ... I would take the variable comparison filtering out of

the joining and into the where clause.

You can't do that with outer joins, because that changes the meaning
(i.e. the resultset).

Gert-Jan

Jul 23 '05 #8

Similar topics

18113

Outer Join syntax

by: Martin | last post by:

I am now working on SQL Server 2000 having had previous experience on a different database. Both of the OUTER JOIN syntaxes is different from what I am used to and I am finding it slightly confusing. For example, given two tables : wipm_tbl_mi wipm_tbl_wi (which may not have data in it for a specific record that exists in the first table.)

Microsoft SQL Server

4855

Right outer join problem

by: Omavlana | last post by:

Hi, I need your help to resolve this problem. I have written a right outer join query between 2 indipendent tables as follows. select b.Account_desc, b.Account, a.CSPL_CSPL from Actual_data_final a right outer join Actual_account_Tbl b on a.account_desc = b.account_desc where (a.source_type = 'TY02' or a.source_type is null) and (a.month = '2ND HALF' or a.month is null) and (a.year = 2004 or a.year is null) and (a.product = 'NP' or...

Microsoft SQL Server

482

help with data transformations with outer join?

by: Preston Landers | last post by:

Hello all. I am trying to write a query that "just" switches some data around so it is shown in a slightly different format. I am already able to do what I want in Oracle 8i, but I am having trouble making it work in SQL Server 2000. I am not a database newbie, but I can't seem to figure this one out so I am turning to the newsgroup. I am thinking that some of the SQL Gurus out there have done this very thing a thousand times before...

Microsoft SQL Server

4871

Outer Join not working when using SQL-92

by: thilbert | last post by:

All, I have a perplexing problem that I hope someone can help me with. I have the following table struct: Permission ----------------- PermissionId Permission

Microsoft SQL Server

10054

Diff between "LEFT OUTER JOIN" and " *= " when condition

by: Dam | last post by:

Using SqlServer : Query 1 : SELECT def.lID as IdDefinition, TDC_AUneValeur.VALEURDERETOUR as ValeurDeRetour FROM serveur.Data_tblDEFINITIONTABLEDECODES def, serveur.Data_tblTABLEDECODEAUNEVALEUR TDC_AUneValeur where def.TYPEDETABLEDECODES = 4

Microsoft SQL Server

31566

Can someone help me with multiple "Left Outer Joins"?

by: Steve | last post by:

I have a SQL query I'm invoking via VB6 & ADO 2.8, that requires three "Left Outer Joins" in order to return every transaction for a specific set of criteria. Using three "Left Outer Joins" slows the system down considerably. I've tried creating a temp db, but I can't figure out how to execute two select commands. (It throws the exception "The column prefix 'tempdb' does not match with a table name or alias name used in the query.")

Microsoft SQL Server

19477

Alternatives for "OUTER JOIN"

by: Martin | last post by:

Hello everybody, I have the following question. As a join clause on Oracle we use " table1.field1 = table2.field1 (+) " On SQL Server we use " table1.field1 *= table2.field1 " Does DB2 have the same type of operator, without using the OUTER JOIN syntax ?

DB2 Database

3616

Migrate Sybase outer join query to UDB DB2 8.2

by: kumar_rangan1976 | last post by:

I need the below sybase code to be migrated in UDB : select distinct c.partnumber as I_PART, case when d.IntegratorID = 'DCX05' then 'U' when d.IntegratorID = 'DCX04' then 'M' when d.IntegratorID = 'DCX03' then 'E' else ' ' end as 'C_LU_SRCE_PCHSNG', o.SupplierCode as I_SUPLR_LOC,

DB2 Database

9228

FILTERING LEFT TABLE IN LEFT OUTER JOIN

by: shanevanle | last post by:

I have two tables that are pretty big. I need about 10 rows in the left table and the right table is filtered to 5 rows as well. It seems when I join the tables in the FROM clause, I have to filter my left table in the WHERE clause and cannot filter it in the FROM clause. This seems like it would cause a lot of overhead especially when my left table is ten thousand rows. Am I wrong in thinking that the two tables get joined with the...

DB2 Database

10363

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

10164

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...

Online Marketing

10110

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

8989

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...

Career Advice

7512

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

6745

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

5397

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

4066

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

3669

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP