473,573 Members | 2,889 Online

# OLAP windowing legerdemain

Friends:

Say I have a simple table like so:

REGION (CHAR(3)
SSN CHAR(9)
ID SMALLINT

With the following data:

REGION SSN ID
------ --------- --
002 555112222 2
008 555112222 8
038 555112222 27
098 111223333 89
016 111223333 14

I'd like to find pair-wise duplicates based on SSN, yielding a result
set like so:

REGION_1 ID_1 SSN REGION_2 ID_2
-------- ---- ------- -------- ----
002 2 555112222 008 8
008 8 555112222 038 27
038 27 555112222 002 2
098 89 111223333 016 14

Now, I could certainly do a self-join (and may yet have to) WHERE ID !=
ID AND SSN = SSN, but this has at least two drawbacks: 1) two passes
through the access path, and 2) I'd get matches and their (redundant)
reciprocals, though on this latter point I'm OK with a little
post-filtering, algorithms for this being available (e.g., those by
Kyte and Molinaro).

What I'm hoping is that among you is someone who knows of a usage of
the OLAP windowing functions for just such an occasion.

My first attempt (below) seems OK for IDs, but generates reciprocals
and will get messy when I try to incoporate the correct corresponding
REGIONs.

WITH
SSNS
(
REGION,
SSN,
ID
)
AS
(
VALUES
('002','5551122 22', 2),
('008','5551122 22', 8),
('038','5551122 22', 27),
('098','1112233 33', 89),
('016','1112233 33', 14)
)
SELECT
ID ID_1,
CASE WHEN ID = ID1 THEN ID2 ELSE ID1 END ID_2
FROM
(
SELECT
ID,
MAX(ID) OVER (PARTITION BY SSN ORDER BY ID ROWS BETWEEN CURRENT ROW
AND 1 FOLLOWING) AS ID1,
MIN(ID) OVER (PARTITION BY SSN ORDER BY ID) AS ID2
FROM
SSNS A
) X

ID1 ID2
----------- -----------
14 89
89 14
2 8
8 27
27 2

Regards,

--Jeff

Oct 9 '06 #1
3 1620

jefftyzzer wrote:
Friends:

Say I have a simple table like so:

REGION (CHAR(3)
SSN CHAR(9)
ID SMALLINT

With the following data:

REGION SSN ID
------ --------- --
002 555112222 2
008 555112222 8
038 555112222 27
098 111223333 89
016 111223333 14

I'd like to find pair-wise duplicates based on SSN, yielding a result
set like so:

REGION_1 ID_1 SSN REGION_2 ID_2
-------- ---- ------- -------- ----
002 2 555112222 008 8
008 8 555112222 038 27
038 27 555112222 002 2
098 89 111223333 016 14
This should do the trick:

with region (REGION, SSN, ID) as (values
('002', '555112222', 2),
('008', '555112222', 8),
('038', '555112222', 27),
('098', '111223333', 89),
('016', '111223333', 14)),
intermediate_re sult (region, id1, ssn, id2, row, id_count) as
(select region, id as id1, ssn, coalesce(max(id ) over(partition by ssn
order by id rows between 1 following and 1 following), min(id)
over(partition by ssn)) as id2, row_number() over(partition by ssn) as
row, sum(1) over(partition by ssn) as id_count
from region
order by ssn, id)
select region, id1, ssn, id2 from intermediate_re sult
where id_count 1 and (id_count 2 or row = 1)
REGION ID1 SSN ID2
====== === === ===
016 14 111223333 89
002 2 555112222 8
008 8 555112222 27
038 27 555112222 2

-Chris

Oct 10 '06 #2
VERY nice! Thanks, Chris--this is a huge help. I appreciate your
spending time on this.

--Jeff

ChrisC wrote:
jefftyzzer wrote:
Friends:

Say I have a simple table like so:

REGION (CHAR(3)
SSN CHAR(9)
ID SMALLINT

With the following data:

REGION SSN ID
------ --------- --
002 555112222 2
008 555112222 8
038 555112222 27
098 111223333 89
016 111223333 14

I'd like to find pair-wise duplicates based on SSN, yielding a result
set like so:

REGION_1 ID_1 SSN REGION_2 ID_2
-------- ---- ------- -------- ----
002 2 555112222 008 8
008 8 555112222 038 27
038 27 555112222 002 2
098 89 111223333 016 14

This should do the trick:

with region (REGION, SSN, ID) as (values
('002', '555112222', 2),
('008', '555112222', 8),
('038', '555112222', 27),
('098', '111223333', 89),
('016', '111223333', 14)),
intermediate_re sult (region, id1, ssn, id2, row, id_count) as
(select region, id as id1, ssn, coalesce(max(id ) over(partition by ssn
order by id rows between 1 following and 1 following), min(id)
over(partition by ssn)) as id2, row_number() over(partition by ssn) as
row, sum(1) over(partition by ssn) as id_count
from region
order by ssn, id)
select region, id1, ssn, id2 from intermediate_re sult
where id_count 1 and (id_count 2 or row = 1)
REGION ID1 SSN ID2
====== === === ===
016 14 111223333 89
002 2 555112222 8
008 8 555112222 27
038 27 555112222 2

-Chris
Oct 10 '06 #3
To display Region2, it will be better to JOIN Regin table. This also
remove unneccesary rows.

-------------------- Commands Entered ------------------------------
WITH region (REGION, SSN, ID) as (values
('002', '555112222', 2),
('008', '555112222', 8),
('038', '555112222', 27),
('098', '111223333', 89),
('016', '111223333', 14)
)
SELECT region_1, id_1, S2.ssn, R.region AS region_2, id_2
FROM (SELECT region_1, id_1, ssn
, NULLIF(COALESCE (idf,idmin),idp ) AS id_2
FROM (SELECT region AS region_1, id AS id_1, ssn
, MAX(id) OVER(PARTITION BY ssn ORDER BY id
ROWS BETWEEN 1 FOLLOWING
AND 1 FOLLOWING) AS idf
, MAX(id) OVER(PARTITION BY ssn ORDER BY id
ROWS BETWEEN 1 PRECEDING
AND 1 PRECEDING) AS idp
, MIN(id) OVER(PARTITION BY ssn) AS idmin
FROM region
) S1
) S2
INNER JOIN
region R
ON R.id = S2.id_2
AND R.ssn = S2.ssn
;
---------------------------------------------------------------------
REGION_1 ID_1 SSN REGION_2 ID_2
-------- ----------- --------- -------- -----------
016 14 111223333 098 89
002 2 555112222 008 8
008 8 555112222 038 27
038 27 555112222 002 2

4 record(s) selected.

Oct 18 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.