I have a VERY basic question about figuring database size. I've inherited a
database which is generally similar to this basic one:
Item, Red, Blue, Green, Yellow
(text), (int),(int),(int),(int)
box, 1,0,0,2
hat, 0,0,0,1
car, 3,0,0,0
This format leads to a lot of zeros in the rows which take up a lot of
space. Say I change it to this format:
Item, Color, Number
(text), (text), (int)
box, Red, 1
box, Yellow, 2
hat, Yellow, 1
car, Red, 3
There will be more rows this way (since there needs to be an extra row for
the 2nd box color) but not as many columns.
My question is how much space does this save. Each int number takes up 4
bytes, right? My question is how much space the row itself takes up. Another
way to ask is how much space, if any, does the row itself take up in the
database?
Thanks. 6 5063
On Mon, 12 Jul 2004 20:18:04 -0700, "DH" <dana_hartley@(remove)hotmail.com>
wrote: I have a VERY basic question about figuring database size. I've inherited a database which is generally similar to this basic one:
Item, Red, Blue, Green, Yellow (text), (int),(int),(int),(int) box, 1,0,0,2 hat, 0,0,0,1 car, 3,0,0,0
This format leads to a lot of zeros in the rows which take up a lot of space. Say I change it to this format:
Item, Color, Number (text), (text), (int) box, Red, 1 box, Yellow, 2 hat, Yellow, 1 car, Red, 3
There will be more rows this way (since there needs to be an extra row for the 2nd box color) but not as many columns.
My question is how much space does this save. Each int number takes up 4 bytes, right? My question is how much space the row itself takes up. Another way to ask is how much space, if any, does the row itself take up in the database?
Thanks.
Those of us who have been working with databases for a long time have pretty
much ceased worrying about the space issue very much. The actual space
equation is calculable, but depends on many factors including how many zeroes
vs non-seroes you need to represent. Furthermore, you should really normalize
that one more step since there are only 3 colors, and no reason to repeat the
text of the color name on each row.
The real issue to be concerned about here is what data representation makes
the most sense in terms of how it's used. In the case of an RGB color system,
the argument could probably be made that there are always exactly 3 color
components, and that is unlikely to change, so the separate Red, Green, and
Blue fields make sense.
You don't need a structure that allows more than 3 components, and
furthermore, what is the meaning of lacking a component? You're suggesting
that it should mean zero, but that's hardly obvious by the schema. The lack
of a record or a Null generally means there is no meaningful data, not that
there is a meaningful zero value. If you make this explicit by always having
color rows for all 3 components, then you aren't saving anything.
> Those of us who have been working with databases for a long time have
pretty much ceased worrying about the space issue very much. The actual space equation is calculable, but depends on many factors including how many
zeroes vs non-seroes you need to represent. Furthermore, you should really
normalize that one more step since there are only 3 colors, and no reason to repeat
the text of the color name on each row.
The real issue to be concerned about here is what data representation
makes the most sense in terms of how it's used. In the case of an RGB color
system, the argument could probably be made that there are always exactly 3 color components, and that is unlikely to change, so the separate Red, Green,
and Blue fields make sense.
You don't need a structure that allows more than 3 components, and furthermore, what is the meaning of lacking a component? You're
suggesting that it should mean zero, but that's hardly obvious by the schema. The
lack of a record or a Null generally means there is no meaningful data, not
that there is a meaningful zero value. If you make this explicit by always
having color rows for all 3 components, then you aren't saving anything.
My example database might have been a little unclear. Think of my example as
an inventory type system for the color of items. An item can only have one
color. So in my example row 1, there are 3 boxes total (2 yellow, 1 Red).
There are only the 4 colors possible (items can't have more than one color)
and this will never change. I'm not quite sure I understand what you are
trying to say above. The way the database is setup now (with a column for
each color), there is a zero value in the 'empty' spaces. The reason the
person used zeros were used instead of nulls had something to do with making
math calculations easier. I'm not sure why that is but that is what they
said when I asked about it. If I changed it to have color rows, there
wouldn't be rows for the zero columns.
On Mon, 12 Jul 2004 21:53:50 -0700, "DH" <dana_hartley@(remove)hotmail.com>
wrote: Those of us who have been working with databases for a long time have pretty much ceased worrying about the space issue very much. The actual space equation is calculable, but depends on many factors including how many zeroes vs non-seroes you need to represent. Furthermore, you should really normalize that one more step since there are only 3 colors, and no reason to repeat the text of the color name on each row.
The real issue to be concerned about here is what data representation makes the most sense in terms of how it's used. In the case of an RGB color system, the argument could probably be made that there are always exactly 3 color components, and that is unlikely to change, so the separate Red, Green, and Blue fields make sense.
You don't need a structure that allows more than 3 components, and furthermore, what is the meaning of lacking a component? You're suggesting that it should mean zero, but that's hardly obvious by the schema. The lack of a record or a Null generally means there is no meaningful data, not that there is a meaningful zero value. If you make this explicit by always having color rows for all 3 components, then you aren't saving anything.
My example database might have been a little unclear. Think of my example as an inventory type system for the color of items. An item can only have one color. So in my example row 1, there are 3 boxes total (2 yellow, 1 Red). There are only the 4 colors possible (items can't have more than one color) and this will never change. I'm not quite sure I understand what you are trying to say above. The way the database is setup now (with a column for each color), there is a zero value in the 'empty' spaces. The reason the person used zeros were used instead of nulls had something to do with making math calculations easier. I'm not sure why that is but that is what they said when I asked about it. If I changed it to have color rows, there wouldn't be rows for the zero columns.
Sorry, I was reading too fast, and did not correctly interpret what you were
trying to do. Actually, your assessmet of how to normalize the database was
pretty much on track. It's better normalization to change your design to have
a related detail row for the quantity at each color as you suggest. If you
don't do that, you must change the code if you ever add or remove a color
choice, so the design original is inflexible.
The next step would be to move the color choices to their own table, and use a
surrogate key to point from the color/quantity detail table to the color
table.
tblItemType
=====
*ItemTypeID Autonumber Primary Key
ItemTypeName Text (100) Index - no duplicates
tblItemColor
=====
*ItemColorID Autonumber Primary Key
ItemColorName Text (40) Index - no duplicates
tblItemCount
=====
*ItemTypeID Long Int 1st field of PK
*ItemColorID Long Int 2nd field of PK
ItemQty Long Int
In this arrangement, we say that tblItemCount is a junction table that defines
a many-to-many relationship between item types and item colors. Because the
combination of ItemColorID and ItemTypeID must be unique, there can be only
one record in tblItemCount for each combination, and that record is the single
place that the quantity for that combination is recorded.
Item color name and item type name are each stored exactly once, in one place,
and each can be changed at will without breaking any relational rules. This
means, if you wanted to change Green to Veridian, that's fine. Likewise, if
you need to correct the spelling of "Automoble", that's fine too.
> Sorry, I was reading too fast, and did not correctly interpret what you
were trying to do. Actually, your assessmet of how to normalize the database
was pretty much on track. It's better normalization to change your design to
have a related detail row for the quantity at each color as you suggest. If
you don't do that, you must change the code if you ever add or remove a color choice, so the design original is inflexible.
Good point.
The next step would be to move the color choices to their own table, and
use a surrogate key to point from the color/quantity detail table to the color table.
tblItemType ===== *ItemTypeID Autonumber Primary Key ItemTypeName Text (100) Index - no duplicates
tblItemColor ===== *ItemColorID Autonumber Primary Key ItemColorName Text (40) Index - no duplicates
tblItemCount ===== *ItemTypeID Long Int 1st field of PK *ItemColorID Long Int 2nd field of PK ItemQty Long Int
In this arrangement, we say that tblItemCount is a junction table that
defines a many-to-many relationship between item types and item colors. Because
the combination of ItemColorID and ItemTypeID must be unique, there can be
only one record in tblItemCount for each combination, and that record is the
single place that the quantity for that combination is recorded.
Item color name and item type name are each stored exactly once, in one
place, and each can be changed at will without breaking any relational rules.
This means, if you wanted to change Green to Veridian, that's fine. Likewise,
if you need to correct the spelling of "Automoble", that's fine too.
That makes sense and thanks for your help. However, I'm a little unclear on
what advantage there is in the ItemColor and ItemType tables aside from
being able to add/change colors later. In my case I am 100% certain that the
colors will never change. That being said, is there still an advantage to
the extra table?
And for the Items I don't see the advantage to the ItemType table except for
changing a name like you state. But, isn't this just as easy to do with an
update query on the table to make the change to all occurances of the
misspelled name? It seems to me like the extra tables are adding complexity
without much benefit. What am I missing?
Thanks again.
On Tue, 13 Jul 2004 00:16:39 -0700, "DH" <dana_hartley@(remove)hotmail.com>
wrote: Sorry, I was reading too fast, and did not correctly interpret what you were trying to do. Actually, your assessmet of how to normalize the database was pretty much on track. It's better normalization to change your design to have a related detail row for the quantity at each color as you suggest. If you don't do that, you must change the code if you ever add or remove a color choice, so the design original is inflexible.
Good point.
The next step would be to move the color choices to their own table, and use a surrogate key to point from the color/quantity detail table to the color table.
tblItemType ===== *ItemTypeID Autonumber Primary Key ItemTypeName Text (100) Index - no duplicates
tblItemColor ===== *ItemColorID Autonumber Primary Key ItemColorName Text (40) Index - no duplicates
tblItemCount ===== *ItemTypeID Long Int 1st field of PK *ItemColorID Long Int 2nd field of PK ItemQty Long Int
In this arrangement, we say that tblItemCount is a junction table that defines a many-to-many relationship between item types and item colors. Because the combination of ItemColorID and ItemTypeID must be unique, there can be only one record in tblItemCount for each combination, and that record is the single place that the quantity for that combination is recorded.
Item color name and item type name are each stored exactly once, in one place, and each can be changed at will without breaking any relational rules. This means, if you wanted to change Green to Veridian, that's fine. Likewise, if you need to correct the spelling of "Automoble", that's fine too.
That makes sense and thanks for your help. However, I'm a little unclear on what advantage there is in the ItemColor and ItemType tables aside from being able to add/change colors later. In my case I am 100% certain that the colors will never change. That being said, is there still an advantage to the extra table?
And for the Items I don't see the advantage to the ItemType table except for changing a name like you state. But, isn't this just as easy to do with an update query on the table to make the change to all occurances of the misspelled name? It seems to me like the extra tables are adding complexity without much benefit. What am I missing?
Thanks again.
1. By eliminating duplication, you eliminate the potential for all kinds of
problems or the creation of even more complexity to solve the problems. If,
for instance, you want to prevent bad data from getting into the system, how
do you do it? Do you create data entry rules? If so, you have information
hidden in rules that may be hard to find later. What if the rules become too
long, and hard to maintain? Isn't it nicer to just have a single list of
entries where every valid entry appears exactly one time?
2. What if item types or item colors start to have more than just names? You
need entity tables for them unless you want to start duplicating even more
fields across the detail table. What if types have categories? What if
colors have price modifiers?
Hi Steve, hi Dana, On Tue, 13 Jul 2004 00:16:39 -0700, "DH" <dana_hartley@(remove)hotmail.com> wrote:
Sorry, I was reading too fast, and did not correctly interpret what you were trying to do. Actually, your assessmet of how to normalize the database was pretty much on track. It's better normalization to change your design to have a related detail row for the quantity at each color as you suggest. If you don't do that, you must change the code if you ever add or remove a color choice, so the design original is inflexible.
Good point.
The next step would be to move the color choices to their own table, and use a surrogate key to point from the color/quantity detail table to the color table.
tblItemType ===== *ItemTypeID Autonumber Primary Key ItemTypeName Text (100) Index - no duplicates
tblItemColor ===== *ItemColorID Autonumber Primary Key ItemColorName Text (40) Index - no duplicates
tblItemCount ===== *ItemTypeID Long Int 1st field of PK *ItemColorID Long Int 2nd field of PK ItemQty Long Int
In this arrangement, we say that tblItemCount is a junction table that defines a many-to-many relationship between item types and item colors. Because the combination of ItemColorID and ItemTypeID must be unique, there can be only one record in tblItemCount for each combination, and that record is the single place that the quantity for that combination is recorded.
Item color name and item type name are each stored exactly once, in one place, and each can be changed at will without breaking any relational rules. This means, if you wanted to change Green to Veridian, that's fine. Likewise, if you need to correct the spelling of "Automoble", that's fine too.
That makes sense and thanks for your help. However, I'm a little unclear on what advantage there is in the ItemColor and ItemType tables aside from being able to add/change colors later. In my case I am 100% certain that the colors will never change. That being said, is there still an advantage to the extra table?
And for the Items I don't see the advantage to the ItemType table except for changing a name like you state. But, isn't this just as easy to do with an update query on the table to make the change to all occurances of the misspelled name? It seems to me like the extra tables are adding complexity without much benefit. What am I missing?
Thanks again.
1. By eliminating duplication, you eliminate the potential for all kinds of problems or the creation of even more complexity to solve the problems. If, for instance, you want to prevent bad data from getting into the system, how do you do it? Do you create data entry rules? If so, you have information hidden in rules that may be hard to find later. What if the rules become too long, and hard to maintain? Isn't it nicer to just have a single list of entries where every valid entry appears exactly one time? 2. What if item types or item colors start to have more than just names? You need entity tables for them unless you want to start duplicating even more fields across the detail table. What if types have categories? What if colors have price modifiers?
If you want to have another view at this theme:
We have build a small freeware tool to show the amount of memory the
tables and indexes physically needs:
Access Memory Reporter 1.0 : www.atroplan.com/AccMemReporter.htm
But notice: this tool counts the pages for each table/index, so you
will see no effect of changing the table structure, if you have only a
few records, you need mass records.
Regards
Thilo Immel
The Access Druid (Access Repair Service) www.atroplan.com This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: pescott |
last post by:
Hello people, I could really use your help on this one... I have script
to upload image data to a database, but all that is uploaded is the
decription, not the data BLOB itself. There must be...
|
by: Beyonder |
last post by:
I'm using InnoDB with Mysql and I've used delete to delete over 10
million records, but the InnoDB database is still the same size on the
disk (over 100 gigs) is there a way to get Mysql/InnoDB to...
|
by: dchow |
last post by:
Our database size is currently 4G and is incrementing at a rate of
45M/day. What is the max size of a SQL database? And what is the
size beyond which the server performance will start to go down?
|
by: Marc C |
last post by:
Hello,
I am trying to clean up a database I inherited.
I have an 80GB SQL 2000 database with 20 datafiles each 4096 in size. I
have been able to remove unneeded data and am now trying to clean...
|
by: ThazKool |
last post by:
Is there anyway to write a class or struct that has no storage. It
only operates on a reference to an already existing type.
This is actually an extension to another thread. The thread went a...
|
by: Kums |
last post by:
What is the maximum permissible size of a database? Is there any limitation.
What is the maximum # of tablespace's allowed in a database?
Thanks for your response.
|
by: Clodoaldo Pinto Neto |
last post by:
I suspect there is something wrong because it takes 73s to delete a single line
from a table whith 140 lines. So I tried to reindex the database:
bash-2.05b$ postgres -P -O -D /var/lib/pgsql/data...
|
by: Joey |
last post by:
I don't care what .net language this come in but I really need to determine
if a disk is dynamic or basic. I have posted something in the WMI group but
no one knows how to do it. Does anyone have...
|
by: Atli |
last post by:
This is an easy to digest 12 step guide on basics of using MySQL. It's a great refresher for those who need it and it work's great for first time MySQL users.
Anyone should be able to get...
|
by: lllomh |
last post by:
Define the method first
this.state = {
buttonBackgroundColor: 'green',
isBlinking: false, // A new status is added to identify whether the button is blinking or not
}
autoStart=()=>{
|
by: DJRhino |
last post by:
Was curious if anyone else was having this same issue or not....
I was just Up/Down graded to windows 11 and now my access combo boxes are not acting right. With win 10 I could start typing...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 4 Oct 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM)
The start time is equivalent to 19:00 (7PM) in Central...
|
by: Aliciasmith |
last post by:
In an age dominated by smartphones, having a mobile app for your business is no longer an option; it's a necessity. Whether you're a startup or an established enterprise, finding the right mobile app...
|
by: tracyyun |
last post by:
Hello everyone,
I have a question and would like some advice on network connectivity. I have one computer connected to my router via WiFi, but I have two other computers that I want to be able to...
|
by: NeoPa |
last post by:
Hello everyone.
I find myself stuck trying to find the VBA way to get Access to create a PDF of the currently-selected (and open) object (Form or Report).
I know it can be done by selecting :...
|
by: Teri B |
last post by:
Hi, I have created a sub-form Roles. In my course form the user selects the roles assigned to the course.
0ne-to-many. One course many roles.
Then I created a report based on the Course form and...
|
by: nia12 |
last post by:
Hi there,
I am very new to Access so apologies if any of this is obvious/not clear.
I am creating a data collection tool for health care employees to complete. It consists of a number of...
|
by: NeoPa |
last post by:
Introduction
For this article I'll be focusing on the Report (clsReport) class. This simply handles making the calling Form invisible until all of the Reports opened by it have been closed, when it...
| |