469,362 Members | 2,447 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,362 developers. It's quick & easy.

Is this possible with SQL?

Hello,

I need to somehow relate certain records together in a table and I am
wondering if this is possible in SQL at all. Consider the table
definition below. I need to find all pairs that have associations
whether through id1 or id2. For the sample data set, the result should
be two groupings: (1, 3, 5) and (2,4). The result could be in any
format (rows and/or columns) as long as there is some indication of the
groupings so I can print out a report.
Here is the sql to create a test case with a small sample of the data:

create table pairs (id1 integer, id2 integer);

insert into pairs values (1, 1);
insert into pairs values (1, 5);

insert into pairs values (3, 3);
insert into pairs values (3, 5);
insert into pairs values (5, 1);

insert into pairs values (5, 3);
insert into pairs values (5, 5);

insert into pairs values (2, 4);

Nov 19 '06 #1
12 1679

de*******@yahoo.com wrote:
Hello,

I need to somehow relate certain records together in a table and I am
wondering if this is possible in SQL at all. Consider the table
definition below. I need to find all pairs that have associations
whether through id1 or id2. For the sample data set, the result should
be two groupings: (1, 3, 5) and (2,4). The result could be in any
format (rows and/or columns) as long as there is some indication of the
groupings so I can print out a report.
In general I would say no, this is not *practically* possible in SQL.
What you are describing is an unrestricted graph and sql does not
handle this very well. See for example this article:

* Maintaining the transitive closure of graphs in SQL.
Guozhu Dong, Leonid Libkin, Jianwen Su and Limsoon Wong.
Int. Journal of Information Technology, 5 (1999), 46-78.

http://citeseer.ist.psu.edu/cache/pa...aintaining.pdf

Especially chapter 4: Transitive closure of arbitrary directed graphs
If you could restrict your graph somehow, there might be easier
solutions lurking around. Trees for example are quite easy to handle,
and there are a number of well known methods.

/Lennart

Nov 19 '06 #2
>I need to somehow relate certain records [sic] together in a table and I am wondering if this is possible in SQL at all. Consider the table definition below. I need to find all pairs that have associations whether through id1 or id2. <<

Can I assume that your DDL should have been:

CREATE TABLE Pairs
(id1 INTEGER NOT NULL,
id2 INTEGER NOT NULL,
PRIMARY KEY(id1, id2));

Can you define what you mean by "association" ? I sse that {1,3,5} all
are in a group with (id1 = 5) , but since you do not have a (2,2) pair,
how did you get {2,4} ?

Nov 20 '06 #3
Lennart wrote:
In general I would say no, this is not *practically* possible in SQL.
What you are describing is an unrestricted graph and sql does not
handle this very well. See for example this article:

* Maintaining the transitive closure of graphs in SQL.
Guozhu Dong, Leonid Libkin, Jianwen Su and Limsoon Wong.
Int. Journal of Information Technology, 5 (1999), 46-78.

http://citeseer.ist.psu.edu/cache/pa...aintaining.pdf

Especially chapter 4: Transitive closure of arbitrary directed graphs

Thanks a lot for the reference. I am just reading it now and will see
what assumptions we can make about the data to simplify the solution.

Nov 20 '06 #4
--CELKO-- wrote:
I need to somehow relate certain records [sic] together in a table and I am wondering if this is possible in SQL at all. Consider the table definition below. I need to find all pairs that have associations whether through id1 or id2. <<

Can I assume that your DDL should have been:

CREATE TABLE Pairs
(id1 INTEGER NOT NULL,
id2 INTEGER NOT NULL,
PRIMARY KEY(id1, id2));

Yes, that is right. I did not include the constraints for sake of
brevity.

Can you define what you mean by "association" ? I sse that {1,3,5} all
are in a group with (id1 = 5) , but since you do not have a (2,2) pair,
how did you get {2,4} ?
Good observation! This is my mistake. The full set should also
include records (2,2), (4,2) and (4,4).

As a general rule, each record is always associated to itself. What I
mean by association is based on a pretty loose definition: if there is
a record with (x, y) in the table, then x is associated to y, and y is
associated to x. The challenge is to identify all the chains of
associations in the whole table.

Thanks!

Nov 20 '06 #5

de*******@yahoo.com wrote:
Hello,

I need to somehow relate certain records together in a table and I am
wondering if this is possible in SQL at all. Consider the table
definition below. I need to find all pairs that have associations
whether through id1 or id2. For the sample data set, the result should
be two groupings: (1, 3, 5) and (2,4). The result could be in any
format (rows and/or columns) as long as there is some indication of the
groupings so I can print out a report.
Here is the sql to create a test case with a small sample of the data:

create table pairs (id1 integer, id2 integer);

insert into pairs values (1, 1);
insert into pairs values (1, 5);

insert into pairs values (3, 3);
insert into pairs values (3, 5);
insert into pairs values (5, 1);

insert into pairs values (5, 3);
insert into pairs values (5, 5);

insert into pairs values (2, 4);
Isn't this just a recursive query?

B.

Nov 20 '06 #6

de*******@yahoo.com wrote:
--CELKO-- wrote:
[...]
Can you define what you mean by "association" ? I sse that {1,3,5} all
are in a group with (id1 = 5) , but since you do not have a (2,2) pair,
how did you get {2,4} ?

Good observation! This is my mistake. The full set should also
include records (2,2), (4,2) and (4,4).
Why is that necessary? If the graph is undirected why not store each
edge once, and remove the loops (x,x)? With your sample data that would
leave us with:

insert into pairs values (1, 5);
insert into pairs values (3, 5);
insert into pairs values (2, 4);

A constraint like CHECK ( id1 < id2 ) could be of help.

By storing both directions and the loops you will have to verify that
for each (x,y) you also have (x,x) (y,y) and (y,x).

Just my 2 euro thoughts

/Lennart

[...]

Nov 20 '06 #7

Lennart wrote:
[...]
A constraint like CHECK ( id1 < id2 ) could be of help.
Note that it doesnt prevent one from adding, say (1,3) to Pairs

/Lennart

Nov 20 '06 #8
Can we assume that you can get rid of identity-type pairs, e.g., {1,1},
as well as one of any reciprocals, e.g., remove either {3,5} or {5,3}?

While you're mulling that over, you might want to take a look at a
posting of mine in this forum titled "Trees, recursion, and grouping,"
in which I was asking in essence the same question.

Some very helpful queries were proferred, as was a clarification by Joe
that I was not dealing with a tree, but rather with a forest, as you
are (after de-duping as I suggest above, you are left w/ two
independent root nodes: 1 and 2).

In a follow-up to this posting, I will post the query I ended up with
(as soon as I track it down :-).

--Jeff

de*******@yahoo.com wrote:
Hello,

I need to somehow relate certain records together in a table and I am
wondering if this is possible in SQL at all. Consider the table
definition below. I need to find all pairs that have associations
whether through id1 or id2. For the sample data set, the result should
be two groupings: (1, 3, 5) and (2,4). The result could be in any
format (rows and/or columns) as long as there is some indication of the
groupings so I can print out a report.
Here is the sql to create a test case with a small sample of the data:

create table pairs (id1 integer, id2 integer);

insert into pairs values (1, 1);
insert into pairs values (1, 5);

insert into pairs values (3, 3);
insert into pairs values (3, 5);
insert into pairs values (5, 1);

insert into pairs values (5, 3);
insert into pairs values (5, 5);

insert into pairs values (2, 4);
Nov 21 '06 #9
Great paper--thanks for sharing it!

--Jeff

Lennart wrote:
de*******@yahoo.com wrote:
Hello,

I need to somehow relate certain records together in a table and I am
wondering if this is possible in SQL at all. Consider the table
definition below. I need to find all pairs that have associations
whether through id1 or id2. For the sample data set, the result should
be two groupings: (1, 3, 5) and (2,4). The result could be in any
format (rows and/or columns) as long as there is some indication of the
groupings so I can print out a report.

In general I would say no, this is not *practically* possible in SQL.
What you are describing is an unrestricted graph and sql does not
handle this very well. See for example this article:

* Maintaining the transitive closure of graphs in SQL.
Guozhu Dong, Leonid Libkin, Jianwen Su and Limsoon Wong.
Int. Journal of Information Technology, 5 (1999), 46-78.

http://citeseer.ist.psu.edu/cache/pa...aintaining.pdf

Especially chapter 4: Transitive closure of arbitrary directed graphs
If you could restrict your graph somehow, there might be easier
solutions lurking around. Trees for example are quite easy to handle,
and there are a number of well known methods.

/Lennart
Nov 21 '06 #10
OK--here's the gist of what I came up with (with much help from those
who replied to my own posting):

WITH
PAIRS(ID_A, ID_B) AS
(
VALUES
(1,5),
(5,3),
(2,4)
),
PAIR_TREE(LEVEL, ID_A, ID_B)
AS
(
SELECT
DISTINCT 1,
PRS1.ID_A,
PRS1.ID_A
FROM
PAIRS PRS1
WHERE
NOT EXISTS
(
SELECT
1
FROM
PAIRS PRS2
WHERE
PRS2.ID_B = PRS1.ID_A
)
UNION ALL
SELECT
PT.LEVEL + 1,
PT.ID_A,
PRS.ID_B
FROM
PAIRS PRS,
PAIR_TREE PT
WHERE
PRS.ID_A = PT.ID_B
)
SELECT
DENSE_RANK() OVER(ORDER BY PT.ID_A),
CASE
LEVEL
WHEN
1
THEN
PT.ID_A
ELSE
PT.ID_B
END
FROM
PAIR_TREE PT
ORDER BY
PT.ID_A

Note that I actually had to abandon this whole effort, as my forests
were not mutually exclusive with respect to nodes. In other words,
where you have two separate trees--{1,3,5} and {2,4}, I might have had
{1,3,5}, {2,4}, and {4,5}.

Note also that you might get into trouble with infinite recursion if
you have something like {3,5}, {5,1}, {5,3}, as 3 points to 5 which
points to 3 which points to.... You get the idea. The way around this
is usually to prime your recursive query with a starting value (or
hard-code a level/boundary), but I'm not sure you can do this as it
looks like you want your query to explore all trees on its own.

I'm no discrete math expert, but bottome-line is you should be OK if 1)
your trees don't share nodes and 2) your leaf nodes doint point back to
branches or the root (i.e., your graph is acyclic).

This is really a fascinating area of query writing. Thanks for posing
your question, and best of luck with your query.

--Jeff

de*******@yahoo.com wrote:
Hello,

I need to somehow relate certain records together in a table and I am
wondering if this is possible in SQL at all. Consider the table
definition below. I need to find all pairs that have associations
whether through id1 or id2. For the sample data set, the result should
be two groupings: (1, 3, 5) and (2,4). The result could be in any
format (rows and/or columns) as long as there is some indication of the
groupings so I can print out a report.
Here is the sql to create a test case with a small sample of the data:

create table pairs (id1 integer, id2 integer);

insert into pairs values (1, 1);
insert into pairs values (1, 5);

insert into pairs values (3, 3);
insert into pairs values (3, 5);
insert into pairs values (5, 1);

insert into pairs values (5, 3);
insert into pairs values (5, 5);

insert into pairs values (2, 4);
Nov 21 '06 #11
jefftyzzer wrote:
OK--here's the gist of what I came up with (with much help from those
who replied to my own posting):

WITH
PAIRS(ID_A, ID_B) AS
(
VALUES
(1,5),
(5,3),
(2,4)
),
PAIR_TREE(LEVEL, ID_A, ID_B)
AS
(
SELECT
DISTINCT 1,
PRS1.ID_A,
PRS1.ID_A
FROM
PAIRS PRS1
WHERE
NOT EXISTS
(
SELECT
1
FROM
PAIRS PRS2
WHERE
PRS2.ID_B = PRS1.ID_A
)
UNION ALL
SELECT
PT.LEVEL + 1,
PT.ID_A,
PRS.ID_B
FROM
PAIRS PRS,
PAIR_TREE PT
WHERE
PRS.ID_A = PT.ID_B
)
SELECT
DENSE_RANK() OVER(ORDER BY PT.ID_A),
CASE
LEVEL
WHEN
1
THEN
PT.ID_A
ELSE
PT.ID_B
END
FROM
PAIR_TREE PT
ORDER BY
PT.ID_A

Note that I actually had to abandon this whole effort, as my forests
were not mutually exclusive with respect to nodes. In other words,
where you have two separate trees--{1,3,5} and {2,4}, I might have had
{1,3,5}, {2,4}, and {4,5}.

Note also that you might get into trouble with infinite recursion if
you have something like {3,5}, {5,1}, {5,3}, as 3 points to 5 which
points to 3 which points to.... You get the idea. The way around this
is usually to prime your recursive query with a starting value (or
hard-code a level/boundary), but I'm not sure you can do this as it
looks like you want your query to explore all trees on its own.

I'm no discrete math expert, but bottome-line is you should be OK if 1)
your trees don't share nodes and 2) your leaf nodes doint point back to
branches or the root (i.e., your graph is acyclic).

This is really a fascinating area of query writing. Thanks for posing
your question, and best of luck with your query.

--Jeff
If you search, you'll see that Knut had a suggestion about storing the
path to detect recursion.

B.

Nov 21 '06 #12
>As a general rule, each record is always associated to itself. What I mean by association is based on a pretty loose definition: if there is a record [sic] with (x, y) in the table, then x is associated to y, and y is associated to x. The challenge is to identify all the chains of associations in the whole table. <<

Look up Cogito at www.cogitoinc.com. They make a graph analytic tool
for this kind of problem. I did part of a white paper for them in
which I used SQL to solve the "Six degrees of Kevin Bacon" problem.
Yes, it can be done; but much like making a size 26 thong, it shoudl
not be attempted with real data.

Nov 21 '06 #13

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

4 posts views Thread by Julia Briggs | last post: by
36 posts views Thread by rbt | last post: by
1 post views Thread by AAA | last post: by
25 posts views Thread by Piotr Nowak | last post: by
14 posts views Thread by bjorklund.emil | last post: by
reply views Thread by zhoujie | last post: by
reply views Thread by suresh191 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.