Hello,
I need to somehow relate certain records together in a table and I am
wondering if this is possible in SQL at all. Consider the table
definition below. I need to find all pairs that have associations
whether through id1 or id2. For the sample data set, the result should
be two groupings: (1, 3, 5) and (2,4). The result could be in any
format (rows and/or columns) as long as there is some indication of the
groupings so I can print out a report.
Here is the sql to create a test case with a small sample of the data:
create table pairs (id1 integer, id2 integer);
insert into pairs values (1, 1);
insert into pairs values (1, 5);
insert into pairs values (3, 3);
insert into pairs values (3, 5);
insert into pairs values (5, 1);
insert into pairs values (5, 3);
insert into pairs values (5, 5);
insert into pairs values (2, 4); 12 1679 de*******@yahoo.com wrote:
Hello,
I need to somehow relate certain records together in a table and I am
wondering if this is possible in SQL at all. Consider the table
definition below. I need to find all pairs that have associations
whether through id1 or id2. For the sample data set, the result should
be two groupings: (1, 3, 5) and (2,4). The result could be in any
format (rows and/or columns) as long as there is some indication of the
groupings so I can print out a report.
In general I would say no, this is not *practically* possible in SQL.
What you are describing is an unrestricted graph and sql does not
handle this very well. See for example this article:
* Maintaining the transitive closure of graphs in SQL.
Guozhu Dong, Leonid Libkin, Jianwen Su and Limsoon Wong.
Int. Journal of Information Technology, 5 (1999), 4678. http://citeseer.ist.psu.edu/cache/pa...aintaining.pdf
Especially chapter 4: Transitive closure of arbitrary directed graphs
If you could restrict your graph somehow, there might be easier
solutions lurking around. Trees for example are quite easy to handle,
and there are a number of well known methods.
/Lennart
>I need to somehow relate certain records [sic] together in a table and I am wondering if this is possible in SQL at all. Consider the table definition below. I need to find all pairs that have associations whether through id1 or id2. <<
Can I assume that your DDL should have been:
CREATE TABLE Pairs
(id1 INTEGER NOT NULL,
id2 INTEGER NOT NULL,
PRIMARY KEY(id1, id2));
Can you define what you mean by "association" ? I sse that {1,3,5} all
are in a group with (id1 = 5) , but since you do not have a (2,2) pair,
how did you get {2,4} ?
Lennart wrote:
In general I would say no, this is not *practically* possible in SQL.
What you are describing is an unrestricted graph and sql does not
handle this very well. See for example this article:
* Maintaining the transitive closure of graphs in SQL.
Guozhu Dong, Leonid Libkin, Jianwen Su and Limsoon Wong.
Int. Journal of Information Technology, 5 (1999), 4678.
http://citeseer.ist.psu.edu/cache/pa...aintaining.pdf
Especially chapter 4: Transitive closure of arbitrary directed graphs
Thanks a lot for the reference. I am just reading it now and will see
what assumptions we can make about the data to simplify the solution.
CELKO wrote:
I need to somehow relate certain records [sic] together in a table and I am wondering if this is possible in SQL at all. Consider the table definition below. I need to find all pairs that have associations whether through id1 or id2. <<
Can I assume that your DDL should have been:
CREATE TABLE Pairs
(id1 INTEGER NOT NULL,
id2 INTEGER NOT NULL,
PRIMARY KEY(id1, id2));
Yes, that is right. I did not include the constraints for sake of
brevity.
Can you define what you mean by "association" ? I sse that {1,3,5} all
are in a group with (id1 = 5) , but since you do not have a (2,2) pair,
how did you get {2,4} ?
Good observation! This is my mistake. The full set should also
include records (2,2), (4,2) and (4,4).
As a general rule, each record is always associated to itself. What I
mean by association is based on a pretty loose definition: if there is
a record with (x, y) in the table, then x is associated to y, and y is
associated to x. The challenge is to identify all the chains of
associations in the whole table.
Thanks! de*******@yahoo.com wrote:
Hello,
I need to somehow relate certain records together in a table and I am
wondering if this is possible in SQL at all. Consider the table
definition below. I need to find all pairs that have associations
whether through id1 or id2. For the sample data set, the result should
be two groupings: (1, 3, 5) and (2,4). The result could be in any
format (rows and/or columns) as long as there is some indication of the
groupings so I can print out a report.
Here is the sql to create a test case with a small sample of the data:
create table pairs (id1 integer, id2 integer);
insert into pairs values (1, 1);
insert into pairs values (1, 5);
insert into pairs values (3, 3);
insert into pairs values (3, 5);
insert into pairs values (5, 1);
insert into pairs values (5, 3);
insert into pairs values (5, 5);
insert into pairs values (2, 4);
Isn't this just a recursive query?
B. de*******@yahoo.com wrote:
CELKO wrote:
[...]
Can you define what you mean by "association" ? I sse that {1,3,5} all
are in a group with (id1 = 5) , but since you do not have a (2,2) pair,
how did you get {2,4} ?
Good observation! This is my mistake. The full set should also
include records (2,2), (4,2) and (4,4).
Why is that necessary? If the graph is undirected why not store each
edge once, and remove the loops (x,x)? With your sample data that would
leave us with:
insert into pairs values (1, 5);
insert into pairs values (3, 5);
insert into pairs values (2, 4);
A constraint like CHECK ( id1 < id2 ) could be of help.
By storing both directions and the loops you will have to verify that
for each (x,y) you also have (x,x) (y,y) and (y,x).
Just my 2 euro thoughts
/Lennart
[...]
Lennart wrote:
[...]
A constraint like CHECK ( id1 < id2 ) could be of help.
Note that it doesnt prevent one from adding, say (1,3) to Pairs
/Lennart
Can we assume that you can get rid of identitytype pairs, e.g., {1,1},
as well as one of any reciprocals, e.g., remove either {3,5} or {5,3}?
While you're mulling that over, you might want to take a look at a
posting of mine in this forum titled "Trees, recursion, and grouping,"
in which I was asking in essence the same question.
Some very helpful queries were proferred, as was a clarification by Joe
that I was not dealing with a tree, but rather with a forest, as you
are (after deduping as I suggest above, you are left w/ two
independent root nodes: 1 and 2).
In a followup to this posting, I will post the query I ended up with
(as soon as I track it down :).
Jeff de*******@yahoo.com wrote:
Hello,
I need to somehow relate certain records together in a table and I am
wondering if this is possible in SQL at all. Consider the table
definition below. I need to find all pairs that have associations
whether through id1 or id2. For the sample data set, the result should
be two groupings: (1, 3, 5) and (2,4). The result could be in any
format (rows and/or columns) as long as there is some indication of the
groupings so I can print out a report.
Here is the sql to create a test case with a small sample of the data:
create table pairs (id1 integer, id2 integer);
insert into pairs values (1, 1);
insert into pairs values (1, 5);
insert into pairs values (3, 3);
insert into pairs values (3, 5);
insert into pairs values (5, 1);
insert into pairs values (5, 3);
insert into pairs values (5, 5);
insert into pairs values (2, 4);
Great paperthanks for sharing it!
Jeff
Lennart wrote:
de*******@yahoo.com wrote:
Hello,
I need to somehow relate certain records together in a table and I am
wondering if this is possible in SQL at all. Consider the table
definition below. I need to find all pairs that have associations
whether through id1 or id2. For the sample data set, the result should
be two groupings: (1, 3, 5) and (2,4). The result could be in any
format (rows and/or columns) as long as there is some indication of the
groupings so I can print out a report.
In general I would say no, this is not *practically* possible in SQL.
What you are describing is an unrestricted graph and sql does not
handle this very well. See for example this article:
* Maintaining the transitive closure of graphs in SQL.
Guozhu Dong, Leonid Libkin, Jianwen Su and Limsoon Wong.
Int. Journal of Information Technology, 5 (1999), 4678.
http://citeseer.ist.psu.edu/cache/pa...aintaining.pdf
Especially chapter 4: Transitive closure of arbitrary directed graphs
If you could restrict your graph somehow, there might be easier
solutions lurking around. Trees for example are quite easy to handle,
and there are a number of well known methods.
/Lennart
OKhere's the gist of what I came up with (with much help from those
who replied to my own posting):
WITH
PAIRS(ID_A, ID_B) AS
(
VALUES
(1,5),
(5,3),
(2,4)
),
PAIR_TREE(LEVEL, ID_A, ID_B)
AS
(
SELECT
DISTINCT 1,
PRS1.ID_A,
PRS1.ID_A
FROM
PAIRS PRS1
WHERE
NOT EXISTS
(
SELECT
1
FROM
PAIRS PRS2
WHERE
PRS2.ID_B = PRS1.ID_A
)
UNION ALL
SELECT
PT.LEVEL + 1,
PT.ID_A,
PRS.ID_B
FROM
PAIRS PRS,
PAIR_TREE PT
WHERE
PRS.ID_A = PT.ID_B
)
SELECT
DENSE_RANK() OVER(ORDER BY PT.ID_A),
CASE
LEVEL
WHEN
1
THEN
PT.ID_A
ELSE
PT.ID_B
END
FROM
PAIR_TREE PT
ORDER BY
PT.ID_A
Note that I actually had to abandon this whole effort, as my forests
were not mutually exclusive with respect to nodes. In other words,
where you have two separate trees{1,3,5} and {2,4}, I might have had
{1,3,5}, {2,4}, and {4,5}.
Note also that you might get into trouble with infinite recursion if
you have something like {3,5}, {5,1}, {5,3}, as 3 points to 5 which
points to 3 which points to.... You get the idea. The way around this
is usually to prime your recursive query with a starting value (or
hardcode a level/boundary), but I'm not sure you can do this as it
looks like you want your query to explore all trees on its own.
I'm no discrete math expert, but bottomeline is you should be OK if 1)
your trees don't share nodes and 2) your leaf nodes doint point back to
branches or the root (i.e., your graph is acyclic).
This is really a fascinating area of query writing. Thanks for posing
your question, and best of luck with your query.
Jeff de*******@yahoo.com wrote:
Hello,
I need to somehow relate certain records together in a table and I am
wondering if this is possible in SQL at all. Consider the table
definition below. I need to find all pairs that have associations
whether through id1 or id2. For the sample data set, the result should
be two groupings: (1, 3, 5) and (2,4). The result could be in any
format (rows and/or columns) as long as there is some indication of the
groupings so I can print out a report.
Here is the sql to create a test case with a small sample of the data:
create table pairs (id1 integer, id2 integer);
insert into pairs values (1, 1);
insert into pairs values (1, 5);
insert into pairs values (3, 3);
insert into pairs values (3, 5);
insert into pairs values (5, 1);
insert into pairs values (5, 3);
insert into pairs values (5, 5);
insert into pairs values (2, 4);
jefftyzzer wrote:
OKhere's the gist of what I came up with (with much help from those
who replied to my own posting):
WITH
PAIRS(ID_A, ID_B) AS
(
VALUES
(1,5),
(5,3),
(2,4)
),
PAIR_TREE(LEVEL, ID_A, ID_B)
AS
(
SELECT
DISTINCT 1,
PRS1.ID_A,
PRS1.ID_A
FROM
PAIRS PRS1
WHERE
NOT EXISTS
(
SELECT
1
FROM
PAIRS PRS2
WHERE
PRS2.ID_B = PRS1.ID_A
)
UNION ALL
SELECT
PT.LEVEL + 1,
PT.ID_A,
PRS.ID_B
FROM
PAIRS PRS,
PAIR_TREE PT
WHERE
PRS.ID_A = PT.ID_B
)
SELECT
DENSE_RANK() OVER(ORDER BY PT.ID_A),
CASE
LEVEL
WHEN
1
THEN
PT.ID_A
ELSE
PT.ID_B
END
FROM
PAIR_TREE PT
ORDER BY
PT.ID_A
Note that I actually had to abandon this whole effort, as my forests
were not mutually exclusive with respect to nodes. In other words,
where you have two separate trees{1,3,5} and {2,4}, I might have had
{1,3,5}, {2,4}, and {4,5}.
Note also that you might get into trouble with infinite recursion if
you have something like {3,5}, {5,1}, {5,3}, as 3 points to 5 which
points to 3 which points to.... You get the idea. The way around this
is usually to prime your recursive query with a starting value (or
hardcode a level/boundary), but I'm not sure you can do this as it
looks like you want your query to explore all trees on its own.
I'm no discrete math expert, but bottomeline is you should be OK if 1)
your trees don't share nodes and 2) your leaf nodes doint point back to
branches or the root (i.e., your graph is acyclic).
This is really a fascinating area of query writing. Thanks for posing
your question, and best of luck with your query.
Jeff
If you search, you'll see that Knut had a suggestion about storing the
path to detect recursion.
B.
>As a general rule, each record is always associated to itself. What I mean by association is based on a pretty loose definition: if there is a record [sic] with (x, y) in the table, then x is associated to y, and y is associated to x. The challenge is to identify all the chains of associations in the whole table. <<
Look up Cogito at www.cogitoinc.com. They make a graph analytic tool
for this kind of problem. I did part of a white paper for them in
which I used SQL to solve the "Six degrees of Kevin Bacon" problem.
Yes, it can be done; but much like making a size 26 thong, it shoudl
not be attempted with real data. This discussion thread is closed Replies have been disabled for this discussion. Similar topics
4 posts
views
Thread by Julia Briggs 
last post: by

36 posts
views
Thread by rbt 
last post: by

20 posts
views
Thread by CHIN 
last post: by

7 posts
views
Thread by Andrzej 
last post: by

2 posts
views
Thread by Bhupesh Naik 
last post: by

1 post
views
Thread by AAA 
last post: by

25 posts
views
Thread by Piotr Nowak 
last post: by

4 posts
views
Thread by RSH 
last post: by

7 posts
views
Thread by Robert S. 
last post: by

14 posts
views
Thread by bjorklund.emil 
last post: by
          