I was wondering how to determine whether an information is redundant or not.
if you have in a table "Main", the basic information for a thing:
Expand|Select|Wrap|Line Numbers
- thing_id | name | image_id | city_id
because i have the location organized this way:
Table cities:
Expand|Select|Wrap|Line Numbers
- city_id | country_id | city_name
Expand|Select|Wrap|Line Numbers
- country_id | country_name
i will need (to get the country name of a particular thing):
1. the "country_id" which is in Table Cities
2. the "country_name" which is in Table Countries
B)
But i could have organized my tables differently by putting the "country_id" in the main table, in order to get one single subquery:
1."country_name" which is in Table Countries
so what would be the right organization?
C)
notice that i could have put "country_name" directly in Table Cities, by what, i would have reduced my 2 subqueries to 1 single in the firs example:
1."country_name" which is in table cities
D)
or i could have directly put the country_name in Table Main:
0. i wouldn't need to subquery that information.
ok so what will determine whether to make multiple tables?
-> whenever i get to a new information node, should i split the information into another table? like-> the City has 2 branches, then avoid redundant information into the Main table and take City information to another..
Maybe this is not the best example, this one may be clearer:
I have a Table Car:
Expand|Select|Wrap|Line Numbers
- car_id | city_id
each of these parts are compound of others: engine->[subpart 1, 2, 3, 4, 5...])
So for each of these car parts, i should have a table, (a car may be able to receive different engines, so i want to be able to track all the ones that suit).
ex: Table engines
Expand|Select|Wrap|Line Numbers
- engine_id | engine_name | car_id | sub_part1 | sub_part2 | sub_part3 ...
Expand|Select|Wrap|Line Numbers
- SELECT engine_name
- FROM engines
- WHERE car_id = 'mycar'
Expand|Select|Wrap|Line Numbers
- SELECT engine_name
- FROM engines
- WHERE car_id IN (
- SELECT car_id
- FROM cars
- WHERE city_id = 'mycity'
- )
it would have been quicker for this query to have a column city_id in the Table Engines to be able to do:
Expand|Select|Wrap|Line Numbers
- SELECT engine_name
- FROM engines
- WHERE city_id = "mycityid"
whilst writing all this post, i am getting to the conclusion that the fact that will determine whether a table should include some redundant information or not, will depend on the queries you make to it.
if i use "Engines" Table more often to get all the "engine_names" in a "city", i'd rather include the "city_id" in that table, else, if i use the table more often to get just the "engine_names" for a specific "car_id", i'd rather not include this redundant information.
"more often" is a bit unprecise, i should say:
if
Expand|Select|Wrap|Line Numbers
- ((time1) x number_queries1) + ((time2) x number_queries2)
- for first table design
- >
- ((time1) x number_queries1) + ((time2) x number_queries2)
- for second table design,
General rule:
Expand|Select|Wrap|Line Numbers
- use the table design for which
- the cumulated execution time for each query
- multiplied by their respective amount of requests
- is the lowest!
That is funny, it's a good thing to write posts it helps thinking!
i normally don't post when i get myself the answer, but it may be useful for others... despite it's length