By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,213 Members | 2,108 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,213 IT Pros & Developers. It's quick & easy.

Help with a Join and Duplicates?

P: n/a
Hi All,
I am banging my head against a brick wall over this problem, so any
help in the correct direction would be muchly appreciated!

I have 2 SQL (MS SQL) server tables, realated to -
a Property,
Sales of that property.
A property is uniquely identifed by its Roll, valuation Number and
Suffix (not my choosing).

Each property can only appear in the property table once, and can only
have 1 assessment - but can have multiple sales (ie - over the
annalysis period the same property can sell more than once).

There is approximatly 19000 properties relating to about 8000 sales.
When creating a query to list property and most recent sale (if there
is any) I end up with somthing like this -
SELECT [roll], [valuation], [suffix], [sale date]
FROM [property]
LEFT JOIN [sales]
ON
[property].[roll] = [sales].[roll] AND
[property].[valuation] = [sales].[valuation] AND
[property].[suffix] = [sales].[suffix]
(table names simplifed).

I get rows where there is all the property data there, but sale date
(etc.) is null (as I would expect from a left join), but the problem is
- when there is more than 1 sale for a property it pulls out another
copy of the property data.

In short, because of that I come out with more records than properties.

ie -
roll valuation suffix sale date
12 456789 A 1/1/2003
12 788988 B NULL
14 123456 A 1/1/2003
14 123456 A 1/1/2004
(Note - the last two are the same property).
I didn't know that the left join can affect both joined tables!
Is there any way around this? Any suggestions/hints in the right
direction would be very much appreciated!

THANKS!

Jul 23 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
On 11 Apr 2005 21:27:53 -0700, "MaxPenguin"
<ma*************@origen.co.nz> wrote:
Hi All,
I am banging my head against a brick wall over this problem, so any
help in the correct direction would be muchly appreciated!

I have 2 SQL (MS SQL) server tables, realated to -
a Property,
Sales of that property.
A property is uniquely identifed by its Roll, valuation Number and
Suffix (not my choosing).

Each property can only appear in the property table once, and can only
have 1 assessment - but can have multiple sales (ie - over the
annalysis period the same property can sell more than once).

There is approximatly 19000 properties relating to about 8000 sales.
When creating a query to list property and most recent sale (if there
is any) I end up with somthing like this -
SELECT [roll], [valuation], [suffix], [sale date]
FROM [property]
LEFT JOIN [sales]
ON
[property].[roll] = [sales].[roll] AND
[property].[valuation] = [sales].[valuation] AND
[property].[suffix] = [sales].[suffix]
(table names simplifed).

I get rows where there is all the property data there, but sale date
(etc.) is null (as I would expect from a left join), but the problem is
- when there is more than 1 sale for a property it pulls out another
copy of the property data.

In short, because of that I come out with more records than properties.

ie -
roll valuation suffix sale date
12 456789 A 1/1/2003
12 788988 B NULL
14 123456 A 1/1/2003
14 123456 A 1/1/2004
(Note - the last two are the same property).
I didn't know that the left join can affect both joined tables!
Is there any way around this? Any suggestions/hints in the right
direction would be very much appreciated!

THANKS!


Are you able to tell us what want the query to return?
(And why?)
Jul 23 '05 #2

P: n/a
It isn't really a matter of the left join "affecting both tables". For
any join that you use, if you have a many to one relationship then you
are going to get multiple rows for the "one" side since the join
effectively creates a cartesian product to start with. Whittling that
down is a matter of your join criteria.

In your description of the problem you state that you want the "most
recent sale" but there is nothing in your query to try to limit the
results to the most recent sale. Either of the queries below should
work. I usually see better performance using the LEFT JOIN/IS NOT NULL
method, but using NOT EXISTS is a bit more readable/logical in my
opinion.

SELECT p.roll, p.valuation, p.suffix, s.[sale date]
FROM property p
LEFT JOIN sales s ON p.roll = s.roll
AND p.valuation = s.valuation
AND p.suffix = s.suffix
LEFT JOIN sales s2 ON s2.roll = s.roll
AND s2.valuation = s.valuation
AND s2.suffix = s.suffix
AND s2.[sale date] > s.[sale date]
WHERE s2.roll IS NULL

SELECT p.roll, p.valuation, p.suffix, s.[sale date]
FROM property p
LEFT JOIN sales s ON p.roll = s.roll
AND p.valuation = s.valuation
AND p.suffix = s.suffix
WHERE NOT EXISTS (SELECT *
FROM sales s2
WHERE s2.valuation = s.valuation
AND s2.suffix = s.suffix
AND s2.roll = s.roll
AND s2.[sale date] > s.[sale date])

Both methods assume that you cannot have two sales on the same exact
date for the same property. They both effectively remove any rows where
there is another row for the same property with a later sale date. That
will leave only those rows with the latest (most recent) sale date.

HTH,
-Tom.

Jul 23 '05 #3

P: n/a

Thomas R. Hummel wrote:
In your description of the problem you state that you want the "most
recent sale" but there is nothing in your query to try to limit the
results to the most recent sale. Either of the queries below should
work. I usually see better performance using the LEFT JOIN/IS NOT NULL method, but using NOT EXISTS is a bit more readable/logical in my
opinion.

Thanks your solution works perfectly. Thats exactly what I needed
really - a way to tell it that I did just want the most recent sales.

Thanks again :)!



SELECT p.roll, p.valuation, p.suffix, s.[sale date]
FROM property p
LEFT JOIN sales s ON p.roll = s.roll
AND p.valuation = s.valuation
AND p.suffix = s.suffix
LEFT JOIN sales s2 ON s2.roll = s.roll
AND s2.valuation = s.valuation
AND s2.suffix = s.suffix
AND s2.[sale date] > s.[sale date]
WHERE s2.roll IS NULL

SELECT p.roll, p.valuation, p.suffix, s.[sale date]
FROM property p
LEFT JOIN sales s ON p.roll = s.roll
AND p.valuation = s.valuation
AND p.suffix = s.suffix
WHERE NOT EXISTS (SELECT *
FROM sales s2
WHERE s2.valuation = s.valuation
AND s2.suffix = s.suffix
AND s2.roll = s.roll
AND s2.[sale date] > s.[sale date])

Both methods assume that you cannot have two sales on the same exact
date for the same property. They both effectively remove any rows where there is another row for the same property with a later sale date. That will leave only those rows with the latest (most recent) sale date.

HTH,
-Tom.


Jul 23 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.