Let's say I have a table of users, and each user has a list of
categories. I could store each user's categories as TEXT with
delimeters like "cat1|cat2|cat3"
But then I need to be able to get a full list of everyone's categories,
without duplicates. Retrieving all the categories, exploding them, and
then removing the duplicates is a bit slow. Is there a better method? 8 1767
"Mark" <mn*******@gmail.comwrote in message
news:11**********************@s13g2000cwa.googlegr oups.com...
Let's say I have a table of users, and each user has a list of
categories. I could store each user's categories as TEXT with
delimeters like "cat1|cat2|cat3"
But then I need to be able to get a full list of everyone's categories,
without duplicates. Retrieving all the categories, exploding them, and
then removing the duplicates is a bit slow. Is there a better method?
Your solution violates 1st normal form and leads to many problems. Create a
table with 2 columns: userid, categoryid, and make the these columns the
key.
Rich
Rich Ryan wrote:
"Mark" <mn*******@gmail.comwrote in message
news:11**********************@s13g2000cwa.googlegr oups.com...
Let's say I have a table of users, and each user has a list of
categories. I could store each user's categories as TEXT with
delimeters like "cat1|cat2|cat3"
But then I need to be able to get a full list of everyone's categories,
without duplicates. Retrieving all the categories, exploding them, and
then removing the duplicates is a bit slow. Is there a better method?
Your solution violates 1st normal form and leads to many problems. Create a
table with 2 columns: userid, categoryid, and make the these columns the
key.
Rich
make what the key? the "these" columns?? i don't understand what you
mean.
and if i did it that method, wouldn't there be a lot of excessive data?
userid | categories
0 | life
0 | work
0 | web
1 | life
1 | work
1 | starcraft
2 | work
2 | starcraft
2 | programming
something like that..? i mean..i guess it works, but it seems like
wasted space. thought there might be a way to group everyone who has
the same category.
users
| userID | user | (userID is Primary)
| 01 | john |
| 02 | paul |
| 03 | george |
| 04 | ringo |
categories
| categoryID | category | (categoryID is Primary)
| 01 | lead vocals |
| 02 | lead guitar |
| 03 | keyboard |
| 04 | harmonica |
| 05 | backing vocals|
| 06 | drums |
| 07 | rhythm guitar |
| 08 | bass guitar |
userID_categoryID
| userID | categoryID | (the primary is made from both together!)
| 01 | 01 |
| 01 | 03 |
| 01 | 04 |
| 01 | 05 |
| 01 | 07 |
| 02 | 01 |
| 02 | 05 |
| 02 | 07 |
| 02 | 08 |
| 03 | 02 |
| 03 | 05 |
| etc | etc |
Mark wrote:
Rich Ryan wrote:
"Mark" <mn*******@gmail.comwrote in message
news:11**********************@s13g2000cwa.googlegr oups.com...
Let's say I have a table of users, and each user has a list of
categories. I could store each user's categories as TEXT with
delimeters like "cat1|cat2|cat3"
>
But then I need to be able to get a full list of everyone's categories,
without duplicates. Retrieving all the categories, exploding them, and
then removing the duplicates is a bit slow. Is there a better method?
>
Your solution violates 1st normal form and leads to many problems. Create a
table with 2 columns: userid, categoryid, and make the these columns the
key.
Rich
make what the key? the "these" columns?? i don't understand what you
mean.
and if i did it that method, wouldn't there be a lot of excessive data?
userid | categories
0 | life
0 | work
0 | web
1 | life
1 | work
1 | starcraft
2 | work
2 | starcraft
2 | programming
something like that..? i mean..i guess it works, but it seems like
wasted space. thought there might be a way to group everyone who has
the same category.
za*******@gmail.com wrote:
users
| userID | user | (userID is Primary)
| 01 | john |
| 02 | paul |
| 03 | george |
| 04 | ringo |
categories
| categoryID | category | (categoryID is Primary)
| 01 | lead vocals |
| 02 | lead guitar |
| 03 | keyboard |
| 04 | harmonica |
| 05 | backing vocals|
| 06 | drums |
| 07 | rhythm guitar |
| 08 | bass guitar |
userID_categoryID
| userID | categoryID | (the primary is made from both together!)
| 01 | 01 |
| 01 | 03 |
| 01 | 04 |
| 01 | 05 |
| 01 | 07 |
| 02 | 01 |
| 02 | 05 |
| 02 | 07 |
| 02 | 08 |
| 03 | 02 |
| 03 | 05 |
| etc | etc |
Mark wrote:
Rich Ryan wrote:
"Mark" <mn*******@gmail.comwrote in message
news:11**********************@s13g2000cwa.googlegr oups.com...
Let's say I have a table of users, and each user has a list of
categories. I could store each user's categories as TEXT with
delimeters like "cat1|cat2|cat3"
But then I need to be able to get a full list of everyone's categories,
without duplicates. Retrieving all the categories, exploding them, and
then removing the duplicates is a bit slow. Is there a better method?
>
Your solution violates 1st normal form and leads to many problems. Create a
table with 2 columns: userid, categoryid, and make the these columns the
key.
>
Rich
make what the key? the "these" columns?? i don't understand what you
mean.
and if i did it that method, wouldn't there be a lot of excessive data?
userid | categories
0 | life
0 | work
0 | web
1 | life
1 | work
1 | starcraft
2 | work
2 | starcraft
2 | programming
something like that..? i mean..i guess it works, but it seems like
wasted space. thought there might be a way to group everyone who has
the same category.
hm. thank you for clearing that up. is having 3 tables faster/more
efficient/save more space than having two tables? i guess there are
less strings stored, but there are twice as many rows necessary..
Mark wrote:
za*******@gmail.com wrote:
users
| userID | user | (userID is Primary)
| 01 | john |
| 02 | paul |
| 03 | george |
| 04 | ringo |
categories
| categoryID | category | (categoryID is Primary)
| 01 | lead vocals |
| 02 | lead guitar |
| 03 | keyboard |
| 04 | harmonica |
| 05 | backing vocals|
| 06 | drums |
| 07 | rhythm guitar |
| 08 | bass guitar |
userID_categoryID
| userID | categoryID | (the primary is made from both together!)
| 01 | 01 |
| 01 | 03 |
| 01 | 04 |
| 01 | 05 |
| 01 | 07 |
| 02 | 01 |
| 02 | 05 |
| 02 | 07 |
| 02 | 08 |
| 03 | 02 |
| 03 | 05 |
| etc | etc |
Mark wrote:
Rich Ryan wrote:
"Mark" <mn*******@gmail.comwrote in message
news:11**********************@s13g2000cwa.googlegr oups.com...
Let's say I have a table of users, and each user has a list of
categories. I could store each user's categories as TEXT with
delimeters like "cat1|cat2|cat3"
>
But then I need to be able to get a full list of everyone's categories,
without duplicates. Retrieving all the categories, exploding them, and
then removing the duplicates is a bit slow. Is there a better method?
>
Your solution violates 1st normal form and leads to many problems. Create a
table with 2 columns: userid, categoryid, and make the these columns the
key.
Rich
>
make what the key? the "these" columns?? i don't understand what you
mean.
>
and if i did it that method, wouldn't there be a lot of excessive data?
>
userid | categories
0 | life
0 | work
0 | web
1 | life
1 | work
1 | starcraft
2 | work
2 | starcraft
2 | programming
>
something like that..? i mean..i guess it works, but it seems like
wasted space. thought there might be a way to group everyone who has
the same category.
hm. thank you for clearing that up. is having 3 tables faster/more
efficient/save more space than having two tables? i guess there are
less strings stored, but there are twice as many rows necessary..
Well, I'm definitely not qualified to comment on efficiency but I'm
sure there must be lots out there on the performance comparisons of
flat tables vs normalized dbs.
In this simple example, there's probably not a lot in it. By putting
the categories in a separate table, I'm reducing the risk of errors in
user input - or at least making those errors more consistent! If the
categories also had descriptions, for instance, then the performance
benefits would become more apparent.
But the db I've suggested is a poor example for demonstrating the real
benefits of (at least some degree of) normalization. I'd write a better
one - but there's SO many well-written tutorials already out there on
db construction and normalization that it hardly seems worth it :-)
Indeed, in the example provided the three table is faster. Overall though,
the performance of your tables depend on the number of rows. If your number
of rows in any of the tables is not going to surpass maybe 400,000 or
500,000 rows, the denormalized solution is the way to go; however, there is
a point of diminishing return that you need to be aware of by examining the
logs and the performance of your database.
Hope this helps.
On 7/11/06 02:33, in article 11*********************@b28g2000cwb.googlegroups.c om, "Mark"
<mn*******@gmail.comwrote:
> za*******@gmail.com wrote:
>users | userID | user | (userID is Primary) | 01 | john | | 02 | paul | | 03 | george | | 04 | ringo |
categories | categoryID | category | (categoryID is Primary) | 01 | lead vocals | | 02 | lead guitar | | 03 | keyboard | | 04 | harmonica | | 05 | backing vocals| | 06 | drums | | 07 | rhythm guitar | | 08 | bass guitar |
userID_categoryID | userID | categoryID | (the primary is made from both together!) | 01 | 01 | | 01 | 03 | | 01 | 04 | | 01 | 05 | | 01 | 07 | | 02 | 01 | | 02 | 05 | | 02 | 07 | | 02 | 08 | | 03 | 02 | | 03 | 05 | | etc | etc |
Mark wrote:
>>Rich Ryan wrote: "Mark" <mn*******@gmail.comwrote in message news:11**********************@s13g2000cwa.googl egroups.com... Let's say I have a table of users, and each user has a list of categories. I could store each user's categories as TEXT with delimeters like "cat1|cat2|cat3" > But then I need to be able to get a full list of everyone's categories, without duplicates. Retrieving all the categories, exploding them, and then removing the duplicates is a bit slow. Is there a better method? >
Your solution violates 1st normal form and leads to many problems. Create a table with 2 columns: userid, categoryid, and make the these columns the key.
Rich
make what the key? the "these" columns?? i don't understand what you mean.
and if i did it that method, wouldn't there be a lot of excessive data?
userid | categories 0 | life 0 | work 0 | web 1 | life 1 | work 1 | starcraft 2 | work 2 | starcraft 2 | programming
something like that..? i mean..i guess it works, but it seems like wasted space. thought there might be a way to group everyone who has the same category.
hm. thank you for clearing that up. is having 3 tables faster/more
efficient/save more space than having two tables? i guess there are
less strings stored, but there are twice as many rows necessary..
Mark wrote:
hm. thank you for clearing that up. is having 3 tables faster/more
efficient/save more space than having two tables? i guess there are
less strings stored, but there are twice as many rows necessary..
Mark,
As Rich first pointed out Normalization is important for data quality
and performance. The number of tables and rows alone does not determine
performance. Proper primary and foreign keys between related tables
along with indexes on columns used for conditions in WHERE clauses has
a greater effect on performance, since that controls how the query is
parsed and how the data engine determines how to fetch the data.
Whereas, hacking delimiters for nested values will tend to slow things
down.
Imagine that you went to the library and the books were simply stacked
on the shelves in whatever order could cram in the most books on the
least shelves using the least staff. That'd be easy for the library to
file the books, but a hassle for patrons to find the books...
-DHS-
Skarjune wrote:
Mark wrote:
hm. thank you for clearing that up. is having 3 tables faster/more
efficient/save more space than having two tables? i guess there are
less strings stored, but there are twice as many rows necessary..
Mark,
As Rich first pointed out Normalization is important for data quality
and performance. The number of tables and rows alone does not determine
performance. Proper primary and foreign keys between related tables
along with indexes on columns used for conditions in WHERE clauses has
a greater effect on performance, since that controls how the query is
parsed and how the data engine determines how to fetch the data.
Whereas, hacking delimiters for nested values will tend to slow things
down.
Imagine that you went to the library and the books were simply stacked
on the shelves in whatever order could cram in the most books on the
least shelves using the least staff. That'd be easy for the library to
file the books, but a hassle for patrons to find the books...
-DHS-
Thanks for explaining all this to me guys :) I knew there was a proper
or better method to approach this problem, but I guess I'm still sort
of new to databases, and wasn't sure what it was. Perhaps I'll google
some stuff on normalization and find out more. Anyways, this should
solve my problem. Thanks a ton! This discussion thread is closed Replies have been disabled for this discussion. Similar topics
45 posts
views
Thread by Joh |
last post: by
|
2 posts
views
Thread by Gerry |
last post: by
|
1 post
views
Thread by MHenry |
last post: by
|
16 posts
views
Thread by tyrfboard |
last post: by
|
7 posts
views
Thread by vsgdp |
last post: by
|
6 posts
views
Thread by dawnerd |
last post: by
|
3 posts
views
Thread by ryan.paquette |
last post: by
| |
11 posts
views
Thread by =?ISO-8859-1?Q?Jean=2DFran=E7ois_Michaud?= |
last post: by
| | | | | | | | | | | |