473,443 Members | 1,993 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Convert correlated subqery to join?

I have a query that does a self join following by correlated subquery. I want to be able to select from "this row" or the latest row before "this row", based on the "rn" column, where column1 is populated. Its currently working using a correlated subquery as follows:

Expand|Select|Wrap|Line Numbers
  1. select 
  2. a.id,
  3. coalesce(a.typeName, b.typeName) as typeName
  4. FROM table1 a, table1 b
  5. where a.id = b.id AND b.rn = (   
  6.     select max(rn) from table1 where id = a.id and rn <= a.rn and column1 is not null
  7. )
  8.  
Its a bit slow though since its needs to execute the inner query for each row in the outer query. I have seen examples of converting this to a join but cant seem to get it to work, as I always need to reference the outer query for

Expand|Select|Wrap|Line Numbers
  1. rn <= a.rn
part.

Can anyone help converting this to a non-correlated query?

thanks
Sep 15 '10 #1

✓ answered by FLANDERS

OK I think I have this sorted now. The below query has gone from 17 minutes to 22 seconds on my machine:
Expand|Select|Wrap|Line Numbers
  1. -- Fills in the blanks of the items table by joining each item change record
  2. -- to the previous valid record for the same item, to allow retrival of last "real" price change
  3. select * from (    
  4.     select 
  5.     a.id,
  6.     coalesce(a.priceChange, b.priceChange) as priceChange,
  7. ROW_NUMBER() OVER (PARTITION BY a.id, a.aMonth ORDER BY b.rn desc) as rn1      
  8.     FROM items a, items b
  9.     where 
  10.     a.id = b.id AND     -- Only want records that relate to same item
  11.     b.rn <= a.rn and    -- Previous or same (ensures we get items with no change at all) records only 
  12.     b.priceChange IS NOT NULL   -- Previous record must be a "real" record
  13. ) tt where rn1 = 1 -- Get the latest "real" record that relates to same item. Latest is determined by the
  14. -- row number column being 1 (since we ordered the partition DESC)
  15.  
The correlated subquery has been replaced by the use of the analytical ROW_NUMBER() function. I do a self join to get all items before the "current" item, where there is a valid priceChange. The use of <= ensures that if there is no price change at all for a particular item, it will be joined to itself.
As part of the column selection, I assign each record a row number by partitioning the result set on the item id and month, ordering it so that the latest change record will be given a value of 1. This makes it possible to select the latest value using rn1 = 1 rather than selecting the max value.

There has been a huge performance increase using this strategy. It again confirms that correlated subqueries should be looked at if there is a query performance issue.

Thanks for your help, hopefully someone else will benefit from this in the future.

11 3262
NeoPa
32,556 Expert Mod 16PB
Perhaps it may be easier to understand what you're looking for if you described it in words.
Sep 15 '10 #2
OK, I used sample tables/columns in first example. There scenario is I have a table of item changes over time, basically ID, priceChange and a calculated row number. Some items may not change in a particular month and so I have a situation where I end up with a table like
Expand|Select|Wrap|Line Numbers
  1. ID    priceChange    Month       rn
  2. 1     100      March 2010  1
  3. 1     null     June 2010   2
  4. 1     200      July 2010   3
  5. 2     520      March 2010  1 
  6. 2     null     June 2010   2
  7.  
I want to fill in the null amount for each item by selecting the previous "real" item price for that particular item. Hence I want to end up with

Expand|Select|Wrap|Line Numbers
  1. ID    priceChange    Month    rn
  2. 1     100      March 2010  1
  3. 1     100      June 2010   2
  4. 1     200      July 2010   3
  5. 2     520      March 2010  1 
  6. 2     520      June 2010   2
  7.  
To do this, I do a join based on the item id being the same and select the record with largest rn value that has same ID.

Expand|Select|Wrap|Line Numbers
  1. select 
  2. a.id,
  3. coalesce(a.priceChange, b.priceChange) as priceChange
  4. FROM items a, items b
  5. where a.id = b.id AND b.rn = (   
  6.     select max(rn) from items where id = a.id and rn <= a.rn
  7. )
  8.  
It works fine, just a bit slow, due to the correlated subquery. I have seen some examples of how these can be replaced with JOINs but I cant get it to fit my problem

Is that any clearer?
Sep 15 '10 #3
NeoPa
32,556 Expert Mod 16PB
Yes. Certainly.

Try this :

Expand|Select|Wrap|Line Numbers
  1. SELECT [ID]
  2.      , COALESCE(tI.Price, sI.PrevPrice) AS [Price]
  3.      , [Month]
  4.      , [RN]
  5. FROM   [Items] AS tI
  6.        LEFT OUTER JOIN
  7.        (
  8.     SELECT   [Price] AS [PrevPrice]
  9.     FROM     [Items]
  10.              INNER JOIN
  11.              (
  12.         SELECT   [ID]
  13.                , MAX([RN]) AS MaxRN
  14.         FROM     [Items]
  15.         GROUP BY [ID]
  16.         ) AS sIi
  17.       ON     tI.ID = sIi.ID
  18.      AND     tI.RN > sIi.RN
  19.     ) AS sI
  20.   ON   tI.ID = sI.ID
First (logically - not in the SQL) find the RN of the previous record (The sub-sub-query sIi), then find the [Price] that matches that record (The sub-qury sI). Lastly, match this record (if found - hence LEFT OUTER JOIN) with the original and coalesce in the normal way.

Let us know how this works out for you.

Welcome to Bytes!
Sep 15 '10 #4
That doesnt run. The ON clause of the INNER JOIN is incorrect it appears:
Expand|Select|Wrap|Line Numbers
  1. ON tI.ID = sIi.ID AND tI.RN > sIi.RN
, since you are referring to an alias (tI) that isnt available at that point:
The multi-part identifier "tI.ID" could not be bound
Sep 15 '10 #5
NeoPa
32,556 Expert Mod 16PB
I'll try for a deeper look at this tomorrow. Sorry to get it wrong (at least so my initial tests would indicate).

Having most of my SQL experience in Jet (Access) I do sometimes lead people amiss (unintentionally of course). Preliminary tests indicate that T-SQL doesn't handle this as well as Jet does, which is something of a surprise. Normally the reverse is more likely. It may just be that I have the vocabulary wrong and it is supported but just using a different approach.
Sep 15 '10 #6
Jeez dont be apologising, any help is great!
Sep 15 '10 #7
ck9663
2,878 Expert 2GB
Tweak this for now while am working on your sample data...

Good Luck!!!

~~ CK
Sep 15 '10 #8
ck9663
2,878 Expert 2GB
Here you go...

Your data...

Expand|Select|Wrap|Line Numbers
  1. declare @YourTable as table
  2. (
  3.    ID int,
  4.    priceChange money,
  5.    cMonth varchar(50),
  6.    rn int
  7. )
  8. insert into @YourTable
  9. select 1,100, 'March 2010', 1
  10. union all
  11. select 1,null,'June 2010',    2
  12. union all
  13. select 1,200, 'July 2010',    3
  14. union all
  15. select 2,520, 'March 2010',   1 
  16. union all
  17. select 2,null,'June 2010',    2
  18.  
  19.  
Your SQL

Expand|Select|Wrap|Line Numbers
  1. select y1.*, LastValidPrice = isnull(y1.pricechange,y2.pricechange)
  2. from @YourTable y1
  3.    left join @YourTable y2 on y1.id = y2.id and y1.rn = y2.rn + 1
  4.  
Happy Coding!!!

~~ CK
Sep 15 '10 #9
Hey,
Thanks for the input. However, I think my sample data wasnt quite complete. Its not always valid to use
y1.rn = y2.rn + 1, the previous "real" value may be a few rows back. This is why I used the max() function, rather than what you did. A complete sample data would be

Expand|Select|Wrap|Line Numbers
  1. ID    priceChange    Month       rn
  2. 1     100      March 2010         1
  3. 1     null     April  2010        2
  4. 1     null     May 2010           3
  5. 1     null     June 2010          4
  6. 1     200      July 2010          5
  7. 2     520      March 2010         1 
  8. 2     null     April  2010        2
  9. 2     null     May 2010           3
  10. 2     300      June 2010          4
  11.  
In this case the value to be filled in for item 1 in April, May and June would come from March i.e. 100. Therefore I cant rely on it being always the previous row.

Any further ideas?
Sep 15 '10 #10
From tweaking your link, the following seems to work as required on the "updated" sample date:

Expand|Select|Wrap|Line Numbers
  1. select id, priceChange, 
  2. coalesce((
  3.    select priceChange 
  4.    from YourTable b 
  5.    where a.rn > b.rn and a.id = b.id and b.priceChange is not null and a.priceChange is null order by b.rn desc limit 1 ), 
  6.    a.priceChange
  7. ) tmp, 
  8. cMonth
  9. from YourTable a order by id, rn
  10.  
Is this also not a correlated subquery, which will have similar performance issues?
Sep 15 '10 #11
OK I think I have this sorted now. The below query has gone from 17 minutes to 22 seconds on my machine:
Expand|Select|Wrap|Line Numbers
  1. -- Fills in the blanks of the items table by joining each item change record
  2. -- to the previous valid record for the same item, to allow retrival of last "real" price change
  3. select * from (    
  4.     select 
  5.     a.id,
  6.     coalesce(a.priceChange, b.priceChange) as priceChange,
  7. ROW_NUMBER() OVER (PARTITION BY a.id, a.aMonth ORDER BY b.rn desc) as rn1      
  8.     FROM items a, items b
  9.     where 
  10.     a.id = b.id AND     -- Only want records that relate to same item
  11.     b.rn <= a.rn and    -- Previous or same (ensures we get items with no change at all) records only 
  12.     b.priceChange IS NOT NULL   -- Previous record must be a "real" record
  13. ) tt where rn1 = 1 -- Get the latest "real" record that relates to same item. Latest is determined by the
  14. -- row number column being 1 (since we ordered the partition DESC)
  15.  
The correlated subquery has been replaced by the use of the analytical ROW_NUMBER() function. I do a self join to get all items before the "current" item, where there is a valid priceChange. The use of <= ensures that if there is no price change at all for a particular item, it will be joined to itself.
As part of the column selection, I assign each record a row number by partitioning the result set on the item id and month, ordering it so that the latest change record will be given a value of 1. This makes it possible to select the latest value using rn1 = 1 rather than selecting the max value.

There has been a huge performance increase using this strategy. It again confirms that correlated subqueries should be looked at if there is a query performance issue.

Thanks for your help, hopefully someone else will benefit from this in the future.
Sep 16 '10 #12

Sign in to post your reply or Sign up for a free account.

Similar topics

1
by: Jason | last post by:
Hello All, I have a SQL Query with multiple correlated Subqueries in it. When it gets executed it runs rather slow due to the size of the QT table. Does anybody have any suggestions how to alter...
8
by: Venkata C | last post by:
Hi! Does anyone here know of a way to goad DB2 into converting a correlated subquery to a non-correlated one? Does DB2 ever do such a conversion? We have a query of the form SELECT .. FROM A...
1
by: Mike MacSween | last post by:
tblProductions one to many to tblEvents tblEvents contains StartDate I want a report where the data are grouped by tblProductions.ProdID, and sorted by the earliest date in each Production. ...
14
by: mike | last post by:
I'm using postgresl 7.3.2 and have a query that executes very slowly. There are 2 tables: Item and LogEvent. ItemID (an int4) is the primary key of Item, and is also a field in LogEvent. Some...
0
by: flamingo | last post by:
I have a query that works just fine, but because I need to use it in Business Objects and I have to use a prompt for the date, I need the initial query for the count as a correlated subquery in the...
0
by: flamingo | last post by:
I have a query that works just fine, but because I need to use it in Business Objects and I have to use a prompt for the date, I need the initial query for the count as a correlated subquery in the...
4
by: sql_server_user | last post by:
Hi, I have a history table with about 400 million rows, with a unique composite nonclustered index on two columns (object id and time period) which is used for most of the queries into the...
5
by: steven.fafel | last post by:
I am running 2 versions of a correlated subquery. The two version differ slightly in design but differ tremendously in performance....if anyone can answer this, you would be awesome. The "bad"...
1
by: AccessHunter | last post by:
Please Help.... I have a query with 2 tables, LegalFile and Cases. LegalFile has the field CaseID(Number Field) and Cases has CaseNbr(Text Field). I am trying to find entries that are in...
1
by: hergele | last post by:
Hello all.. I am trying to convert an application which is working on Oracle to postgreSql.. I've seen a query in code something like this... select * from table1 a1 , table1 a2, table2 a3, ...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.