472,107 Members | 1,238 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,107 software developers and data experts.

Self join to multiple references...

Is there a way to store an indefinite number of keys in one field and
self join them? Imagine, for example, you have a table of users with a
"friends" column. Say user 1 is friends with users 9, 7, 24 and 6; then
would it be possible to store this
array/list/whatever-you-want-to-call-it into user 1's friends cell and
extract the join in a query, some how? An easy and obvious way, that
is; I can think of string processing hacks ;)

Thanks :)
Christopher Harrison

Oct 25 '05 #1
6 2556
>Is there a way to store an indefinite number of keys in one field and
self join them?
Under the right circumstances, group_concat() might do that.
Imagine, for example, you have a table of users with a
"friends" column. Say user 1 is friends with users 9, 7, 24 and 6; then
would it be possible to store this
array/list/whatever-you-want-to-call-it into user 1's friends cell and
extract the join in a query, some how?
I can't imagine why you'd want to STORE the info that way. It makes
it very inconvenient to access. I can see why you might want to
display it that way if the screen and/or paper were wide enough to
accomodate the person with the most friends.
An easy and obvious way, that
is; I can think of string processing hacks ;)


Gordon L. Burditt
Oct 25 '05 #2
It came to me, about an hour after I posted my message, a reasonable
way to do this. I'd just like to ask people's opinion on my technique
(I'm kind of a newbie ;)

Basically, each user doesn't have a "friends" column anymore, they have
their own table (e.g. "friends_of_user123") that lists their friends'
ids from the users table. One could make the user's friends' ids the
primary key to avoid duplicates.
The bonus to this method is that the table could be used to store other
information (e.g. the user's relationship with each friend: best
friend; casual acquaintence; etc.). These friends tables could be
generated easily with an SQL script. The problem lies in having a great
heap of tables, as each user is going to require one of their own...
Although it would work, it seems to me like there's a great deal of
redundancy that should be simplifed.

Any thoughts?

Christopher Harrison
Gordon Burditt wrote:
Is there a way to store an indefinite number of keys in one field and
self join them?


Under the right circumstances, group_concat() might do that.
Imagine, for example, you have a table of users with a
"friends" column. Say user 1 is friends with users 9, 7, 24 and 6; then
would it be possible to store this
array/list/whatever-you-want-to-call-it into user 1's friends cell and
extract the join in a query, some how?


I can't imagine why you'd want to STORE the info that way. It makes
it very inconvenient to access. I can see why you might want to
display it that way if the screen and/or paper were wide enough to
accomodate the person with the most friends.
An easy and obvious way, that
is; I can think of string processing hacks ;)


Gordon L. Burditt


Oct 26 '05 #3
>It came to me, about an hour after I posted my message, a reasonable
way to do this. I'd just like to ask people's opinion on my technique
(I'm kind of a newbie ;)

Basically, each user doesn't have a "friends" column anymore, they have
their own table (e.g. "friends_of_user123") that lists their friends'
ids from the users table. One could make the user's friends' ids the
primary key to avoid duplicates.
Naming tables based on a field is almost always a bad idea. Consider
adding a column, and put the table name, or the variable part of
it, into THAT. Now throw all the rows into a single table. E.g.
you have a table "friends", with columns "user" and "friend". Add
a unique index on (user,friend).

The one exception I can think of to the above rule is archival data.
Say, you've got different tables for transactions in different
years. Normally, I'd say put them all in one table, *BUT* if the
idea for separating them is to remove the old transactions from the
active database to CD-ROM or tape or something because you don't
have enough storage for 80 years of transactions, that's a reasonable
approach.

This kind of "link" table is common for a many-to-many relationship.
(e.g. A is a DVD title owned by B). If A is related to B, you put
in a row for (A,B), using the primary keys of the other tables
involved. You can have many rows with A in the left column, and
many rows with A in the right column. Often there is the restriction
that you may only have (A,B) in there ONCE (there's no "A is a
triple-friend of B"), so you have a unique index on the pair of
columns. You get the list of A's friends with select friend from
friends where user = 'A'. You can also get a list of the people
who consider B a friend with select user from friends where friend
= 'B'.

I'm assuming here that you are not trying to represent a two-way
relationship here. In other words, if A considers B to be a friend,
it doesn't necessarily mean that B considers A to be a friend, but
you're insisting on it anyway. Sometimes that holds: if A is
married to B, normally B is also married to A (anyone know of real
exceptions to this?). However A may *THINK* he is married to B,
but B is a trigamist and A isn't her first husband. But don't count
on being able to label the columns "husband" and "wife". In some
states that doesn't hold.

All of the ways mentioned (all crammed into one field, lists in
different tables, link table listing each pair of people and friend)
don't deal with an assumption "if A is a friend of B, then B must
be a friend of A" well, and it takes extra code to enforce that if
you insist on doing so. It also makes it hard to determine who can
control the friends lists, and if A is stalking B, you may have to
deal with court orders from B to remove "B is a friend of A" and
not let A put it back.
The bonus to this method is that the table could be used to store other
information (e.g. the user's relationship with each friend: best
friend; casual acquaintence; etc.).
"link" tables don't usually have this extra info but it is an appropriate
place to put it in some circumstances.
These friends tables could be
generated easily with an SQL script. The problem lies in having a great
heap of tables, as each user is going to require one of their own... Although it would work, it seems to me like there's a great deal of
redundancy that should be simplifed.


One "link" table is, I think, a much better solution. It also allows
queries like "how many people have B as a friend"? which the
variable-table-name approach doesn't (at least not easily).

Gordon L. Burditt
Oct 26 '05 #4
> Naming tables based on a field is almost always a bad idea. Consider
adding a column, and put the table name, or the variable part of
it, into THAT. Now throw all the rows into a single table. E.g.
you have a table "friends", with columns "user" and "friend". Add
a unique index on (user,friend).


Ahh, that might be a better idea ;) I was thinking that, like you say,
naming tables after fields is a bit dodgy... So if I set multiple
fields as the primary keys, data in one of the fields doesn't
necessarily need to be unique, just so long as, combined with the other
fields, it is a unique tuple? Thanks for that :)
Fortunately I'm not trying to represent a two-way relationship; well,
it can be either/or. I realise it bloats things somewhat, but in that
event, we can just have (A,B) and (B,A) as records. I guess we could
have an extra boolean field for two-way relationships; but then, if I
had my status field, the two friends would have to consider themselves
equal partners (i.e. both good friends, etc.), which might not be
realistic.

Thanks very much :)
Christopher Harrison

P.S. I realise it's a bit contrived, but whilst you can be married to a
mafia don (and so, vicariously, married to the mob), it doesn't really
make sense for the mob to be married to you ;)

Oct 26 '05 #5
>> Naming tables based on a field is almost always a bad idea. Consider
adding a column, and put the table name, or the variable part of
it, into THAT. Now throw all the rows into a single table. E.g.
you have a table "friends", with columns "user" and "friend". Add
a unique index on (user,friend).
Ahh, that might be a better idea ;) I was thinking that, like you say,
naming tables after fields is a bit dodgy... So if I set multiple
fields as the primary keys, data in one of the fields doesn't
necessarily need to be unique, just so long as, combined with the other
fields, it is a unique tuple? Thanks for that :)


Yep. I do that all the time, for instance, in activity logs, which
record stuff like hourly mail traffic through various hosts back a
few years. The primary key is likely to be the fields (timestamp,
host, statistic) where there are plenty (hundreds) of identical
timestamps (usually a multiple of 1 hour), plenty (many thousands)
of records for the same host, and plenty (millions) of records for
the same statistic (mail received, mail delivered, viruses rejected,
mail bounced, etc. There's a big enum for all the statistics names)
but only one count for any combination of the three.
Fortunately I'm not trying to represent a two-way relationship; well,
it can be either/or. I realise it bloats things somewhat, but in that
event, we can just have (A,B) and (B,A) as records. I guess we could
have an extra boolean field for two-way relationships; but then, if I
had my status field, the two friends would have to consider themselves
equal partners (i.e. both good friends, etc.), which might not be
realistic.
Think about the query needed to determine if A is related to B in
a two-way setup, leaving aside any extra data going into the link
table, like how serious/type of relationship. The extra data makes
it messier and putting in two records, one for each way, looks more
and more attractive. Even without the extra data, spending the
table space for performance and less complexity may be a big win.

SELECT count(*) FROM friends where user = 'A' and friend = 'B';

will return 1 if A is a friend of B, and 0 if not. If you're trying to
represent a two-way relationship, you need something like:

SELECT count(*) FROM friends where (user = 'A' and friend = 'B') or
(friend = 'A' and user = 'B' and two_way_flag = 'Y')

which is less efficient. It also might manage to return 2 (the
primary key constraint won't prevent (A,B) and (B,A) both existing.)

Adequately updating this also requires some effort (and locking)
when a user decides to delete a friend. You might have to turn a
two-way relationship into a one-way one.

Trying to do joins between a user and his friends (the user table
gets joined twice, once for the user, once for the friend, and the
friends table gets joined once) to determine, say, the distribution
of age differences between users and their friends gets MUCH messier
with the two-way setup. You can't join on variable column names
or variable table names.

select (u1.birthyear - u2.birthyear) as diff, count(u1.birthyear)
from user u1, user u2, friend f where u1.id = f.user and
u2.id = f.friend group by diff;
P.S. I realise it's a bit contrived, but whilst you can be married to a
mafia don (and so, vicariously, married to the mob), it doesn't really
make sense for the mob to be married to you ;)


That's not the definition of marriage that Texas uses (Texas has a
constitutional amendment on the ballot defining marriage as a union
of one man and one woman. I would prefer to see marriage defined
as the union of two social security numbers on a Form 1040, or
better, define marriage as whatever your church says and leave the
state out of it completely so, for instance, that "marital status"
box on tax forms has to go away.)

You might actually end up with A married to B but not B married to
A if A and B are living in different states (I believe there are
some contrived situations where the state line runs through the
marriage bed) with different laws regarding recognition of homosexual
marriage.

Gordon L. Burditt
Oct 26 '05 #6
Hey Christopher,

Yes, you could do that, but why?

Why not just create a simple table like:

CREATE TABLE UserFriends(
UserID int,
FriendUserID int)

"Christopher Harrison" <Sp*********@ChrisHarrison.co.uk> wrote in message
news:11**********************@g44g2000cwa.googlegr oups.com...
Is there a way to store an indefinite number of keys in one field and
self join them? Imagine, for example, you have a table of users with a
"friends" column. Say user 1 is friends with users 9, 7, 24 and 6; then
would it be possible to store this
array/list/whatever-you-want-to-call-it into user 1's friends cell and
extract the join in a query, some how? An easy and obvious way, that
is; I can think of string processing hacks ;)

Thanks :)
Christopher Harrison

Oct 26 '05 #7

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by Paul Bramscher | last post: by
7 posts views Thread by steve | last post: by
14 posts views Thread by Tieche Bruce A MSgt USMTM/AFD | last post: by
6 posts views Thread by John E | last post: by
3 posts views Thread by sks | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.