By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
459,473 Members | 1,278 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 459,473 IT Pros & Developers. It's quick & easy.

Basic question on database size - does an empty row take up space?

P: n/a
DH
I have a VERY basic question about figuring database size. I've inherited a
database which is generally similar to this basic one:

Item, Red, Blue, Green, Yellow
(text), (int),(int),(int),(int)
box, 1,0,0,2
hat, 0,0,0,1
car, 3,0,0,0

This format leads to a lot of zeros in the rows which take up a lot of
space. Say I change it to this format:

Item, Color, Number
(text), (text), (int)
box, Red, 1
box, Yellow, 2
hat, Yellow, 1
car, Red, 3

There will be more rows this way (since there needs to be an extra row for
the 2nd box color) but not as many columns.

My question is how much space does this save. Each int number takes up 4
bytes, right? My question is how much space the row itself takes up. Another
way to ask is how much space, if any, does the row itself take up in the
database?

Thanks.
Nov 13 '05 #1
Share this Question
Share on Google+
6 Replies


P: n/a
On Mon, 12 Jul 2004 20:18:04 -0700, "DH" <dana_hartley@(remove)hotmail.com>
wrote:
I have a VERY basic question about figuring database size. I've inherited a
database which is generally similar to this basic one:

Item, Red, Blue, Green, Yellow
(text), (int),(int),(int),(int)
box, 1,0,0,2
hat, 0,0,0,1
car, 3,0,0,0

This format leads to a lot of zeros in the rows which take up a lot of
space. Say I change it to this format:

Item, Color, Number
(text), (text), (int)
box, Red, 1
box, Yellow, 2
hat, Yellow, 1
car, Red, 3

There will be more rows this way (since there needs to be an extra row for
the 2nd box color) but not as many columns.

My question is how much space does this save. Each int number takes up 4
bytes, right? My question is how much space the row itself takes up. Another
way to ask is how much space, if any, does the row itself take up in the
database?

Thanks.


Those of us who have been working with databases for a long time have pretty
much ceased worrying about the space issue very much. The actual space
equation is calculable, but depends on many factors including how many zeroes
vs non-seroes you need to represent. Furthermore, you should really normalize
that one more step since there are only 3 colors, and no reason to repeat the
text of the color name on each row.

The real issue to be concerned about here is what data representation makes
the most sense in terms of how it's used. In the case of an RGB color system,
the argument could probably be made that there are always exactly 3 color
components, and that is unlikely to change, so the separate Red, Green, and
Blue fields make sense.

You don't need a structure that allows more than 3 components, and
furthermore, what is the meaning of lacking a component? You're suggesting
that it should mean zero, but that's hardly obvious by the schema. The lack
of a record or a Null generally means there is no meaningful data, not that
there is a meaningful zero value. If you make this explicit by always having
color rows for all 3 components, then you aren't saving anything.
Nov 13 '05 #2

P: n/a
DH
> Those of us who have been working with databases for a long time have
pretty
much ceased worrying about the space issue very much. The actual space
equation is calculable, but depends on many factors including how many zeroes vs non-seroes you need to represent. Furthermore, you should really normalize that one more step since there are only 3 colors, and no reason to repeat the text of the color name on each row.

The real issue to be concerned about here is what data representation makes the most sense in terms of how it's used. In the case of an RGB color system, the argument could probably be made that there are always exactly 3 color
components, and that is unlikely to change, so the separate Red, Green, and Blue fields make sense.

You don't need a structure that allows more than 3 components, and
furthermore, what is the meaning of lacking a component? You're suggesting that it should mean zero, but that's hardly obvious by the schema. The lack of a record or a Null generally means there is no meaningful data, not that there is a meaningful zero value. If you make this explicit by always having color rows for all 3 components, then you aren't saving anything.


My example database might have been a little unclear. Think of my example as
an inventory type system for the color of items. An item can only have one
color. So in my example row 1, there are 3 boxes total (2 yellow, 1 Red).
There are only the 4 colors possible (items can't have more than one color)
and this will never change. I'm not quite sure I understand what you are
trying to say above. The way the database is setup now (with a column for
each color), there is a zero value in the 'empty' spaces. The reason the
person used zeros were used instead of nulls had something to do with making
math calculations easier. I'm not sure why that is but that is what they
said when I asked about it. If I changed it to have color rows, there
wouldn't be rows for the zero columns.


Nov 13 '05 #3

P: n/a
On Mon, 12 Jul 2004 21:53:50 -0700, "DH" <dana_hartley@(remove)hotmail.com>
wrote:
Those of us who have been working with databases for a long time have

pretty
much ceased worrying about the space issue very much. The actual space
equation is calculable, but depends on many factors including how many

zeroes
vs non-seroes you need to represent. Furthermore, you should really

normalize
that one more step since there are only 3 colors, and no reason to repeat

the
text of the color name on each row.

The real issue to be concerned about here is what data representation

makes
the most sense in terms of how it's used. In the case of an RGB color

system,
the argument could probably be made that there are always exactly 3 color
components, and that is unlikely to change, so the separate Red, Green,

and
Blue fields make sense.

You don't need a structure that allows more than 3 components, and
furthermore, what is the meaning of lacking a component? You're

suggesting
that it should mean zero, but that's hardly obvious by the schema. The

lack
of a record or a Null generally means there is no meaningful data, not

that
there is a meaningful zero value. If you make this explicit by always

having
color rows for all 3 components, then you aren't saving anything.


My example database might have been a little unclear. Think of my example as
an inventory type system for the color of items. An item can only have one
color. So in my example row 1, there are 3 boxes total (2 yellow, 1 Red).
There are only the 4 colors possible (items can't have more than one color)
and this will never change. I'm not quite sure I understand what you are
trying to say above. The way the database is setup now (with a column for
each color), there is a zero value in the 'empty' spaces. The reason the
person used zeros were used instead of nulls had something to do with making
math calculations easier. I'm not sure why that is but that is what they
said when I asked about it. If I changed it to have color rows, there
wouldn't be rows for the zero columns.


Sorry, I was reading too fast, and did not correctly interpret what you were
trying to do. Actually, your assessmet of how to normalize the database was
pretty much on track. It's better normalization to change your design to have
a related detail row for the quantity at each color as you suggest. If you
don't do that, you must change the code if you ever add or remove a color
choice, so the design original is inflexible.

The next step would be to move the color choices to their own table, and use a
surrogate key to point from the color/quantity detail table to the color
table.

tblItemType
=====
*ItemTypeID Autonumber Primary Key
ItemTypeName Text (100) Index - no duplicates

tblItemColor
=====
*ItemColorID Autonumber Primary Key
ItemColorName Text (40) Index - no duplicates

tblItemCount
=====
*ItemTypeID Long Int 1st field of PK
*ItemColorID Long Int 2nd field of PK
ItemQty Long Int

In this arrangement, we say that tblItemCount is a junction table that defines
a many-to-many relationship between item types and item colors. Because the
combination of ItemColorID and ItemTypeID must be unique, there can be only
one record in tblItemCount for each combination, and that record is the single
place that the quantity for that combination is recorded.

Item color name and item type name are each stored exactly once, in one place,
and each can be changed at will without breaking any relational rules. This
means, if you wanted to change Green to Veridian, that's fine. Likewise, if
you need to correct the spelling of "Automoble", that's fine too.

Nov 13 '05 #4

P: n/a
DH
> Sorry, I was reading too fast, and did not correctly interpret what you
were
trying to do. Actually, your assessmet of how to normalize the database was pretty much on track. It's better normalization to change your design to have a related detail row for the quantity at each color as you suggest. If you don't do that, you must change the code if you ever add or remove a color
choice, so the design original is inflexible.

Good point.
The next step would be to move the color choices to their own table, and use a surrogate key to point from the color/quantity detail table to the color
table.

tblItemType
=====
*ItemTypeID Autonumber Primary Key
ItemTypeName Text (100) Index - no duplicates

tblItemColor
=====
*ItemColorID Autonumber Primary Key
ItemColorName Text (40) Index - no duplicates

tblItemCount
=====
*ItemTypeID Long Int 1st field of PK
*ItemColorID Long Int 2nd field of PK
ItemQty Long Int

In this arrangement, we say that tblItemCount is a junction table that defines a many-to-many relationship between item types and item colors. Because the combination of ItemColorID and ItemTypeID must be unique, there can be only one record in tblItemCount for each combination, and that record is the single place that the quantity for that combination is recorded.

Item color name and item type name are each stored exactly once, in one place, and each can be changed at will without breaking any relational rules. This means, if you wanted to change Green to Veridian, that's fine. Likewise, if you need to correct the spelling of "Automoble", that's fine too.


That makes sense and thanks for your help. However, I'm a little unclear on
what advantage there is in the ItemColor and ItemType tables aside from
being able to add/change colors later. In my case I am 100% certain that the
colors will never change. That being said, is there still an advantage to
the extra table?

And for the Items I don't see the advantage to the ItemType table except for
changing a name like you state. But, isn't this just as easy to do with an
update query on the table to make the change to all occurances of the
misspelled name? It seems to me like the extra tables are adding complexity
without much benefit. What am I missing?

Thanks again.
Nov 13 '05 #5

P: n/a
On Tue, 13 Jul 2004 00:16:39 -0700, "DH" <dana_hartley@(remove)hotmail.com>
wrote:
Sorry, I was reading too fast, and did not correctly interpret what you

were
trying to do. Actually, your assessmet of how to normalize the database

was
pretty much on track. It's better normalization to change your design to

have
a related detail row for the quantity at each color as you suggest. If

you
don't do that, you must change the code if you ever add or remove a color
choice, so the design original is inflexible.


Good point.
The next step would be to move the color choices to their own table, and

use a
surrogate key to point from the color/quantity detail table to the color
table.

tblItemType
=====
*ItemTypeID Autonumber Primary Key
ItemTypeName Text (100) Index - no duplicates

tblItemColor
=====
*ItemColorID Autonumber Primary Key
ItemColorName Text (40) Index - no duplicates

tblItemCount
=====
*ItemTypeID Long Int 1st field of PK
*ItemColorID Long Int 2nd field of PK
ItemQty Long Int

In this arrangement, we say that tblItemCount is a junction table that

defines
a many-to-many relationship between item types and item colors. Because

the
combination of ItemColorID and ItemTypeID must be unique, there can be

only
one record in tblItemCount for each combination, and that record is the

single
place that the quantity for that combination is recorded.

Item color name and item type name are each stored exactly once, in one

place,
and each can be changed at will without breaking any relational rules.

This
means, if you wanted to change Green to Veridian, that's fine. Likewise,

if
you need to correct the spelling of "Automoble", that's fine too.


That makes sense and thanks for your help. However, I'm a little unclear on
what advantage there is in the ItemColor and ItemType tables aside from
being able to add/change colors later. In my case I am 100% certain that the
colors will never change. That being said, is there still an advantage to
the extra table?

And for the Items I don't see the advantage to the ItemType table except for
changing a name like you state. But, isn't this just as easy to do with an
update query on the table to make the change to all occurances of the
misspelled name? It seems to me like the extra tables are adding complexity
without much benefit. What am I missing?

Thanks again.


1. By eliminating duplication, you eliminate the potential for all kinds of
problems or the creation of even more complexity to solve the problems. If,
for instance, you want to prevent bad data from getting into the system, how
do you do it? Do you create data entry rules? If so, you have information
hidden in rules that may be hard to find later. What if the rules become too
long, and hard to maintain? Isn't it nicer to just have a single list of
entries where every valid entry appears exactly one time?
2. What if item types or item colors start to have more than just names? You
need entity tables for them unless you want to start duplicating even more
fields across the detail table. What if types have categories? What if
colors have price modifiers?
Nov 13 '05 #6

P: n/a
Hi Steve, hi Dana,
On Tue, 13 Jul 2004 00:16:39 -0700, "DH" <dana_hartley@(remove)hotmail.com>
wrote:
Sorry, I was reading too fast, and did not correctly interpret what you

were
trying to do. Actually, your assessmet of how to normalize the database

was
pretty much on track. It's better normalization to change your design to

have
a related detail row for the quantity at each color as you suggest. If

you
don't do that, you must change the code if you ever add or remove a color
choice, so the design original is inflexible.


Good point.
The next step would be to move the color choices to their own table, and

use a
surrogate key to point from the color/quantity detail table to the color
table.

tblItemType
=====
*ItemTypeID Autonumber Primary Key
ItemTypeName Text (100) Index - no duplicates

tblItemColor
=====
*ItemColorID Autonumber Primary Key
ItemColorName Text (40) Index - no duplicates

tblItemCount
=====
*ItemTypeID Long Int 1st field of PK
*ItemColorID Long Int 2nd field of PK
ItemQty Long Int

In this arrangement, we say that tblItemCount is a junction table that

defines
a many-to-many relationship between item types and item colors. Because

the
combination of ItemColorID and ItemTypeID must be unique, there can be

only
one record in tblItemCount for each combination, and that record is the

single
place that the quantity for that combination is recorded.

Item color name and item type name are each stored exactly once, in one

place,
and each can be changed at will without breaking any relational rules.

This
means, if you wanted to change Green to Veridian, that's fine. Likewise,

if
you need to correct the spelling of "Automoble", that's fine too.


That makes sense and thanks for your help. However, I'm a little unclear on
what advantage there is in the ItemColor and ItemType tables aside from
being able to add/change colors later. In my case I am 100% certain that the
colors will never change. That being said, is there still an advantage to
the extra table?

And for the Items I don't see the advantage to the ItemType table except for
changing a name like you state. But, isn't this just as easy to do with an
update query on the table to make the change to all occurances of the
misspelled name? It seems to me like the extra tables are adding complexity
without much benefit. What am I missing?

Thanks again.


1. By eliminating duplication, you eliminate the potential for all kinds of
problems or the creation of even more complexity to solve the problems. If,
for instance, you want to prevent bad data from getting into the system, how
do you do it? Do you create data entry rules? If so, you have information
hidden in rules that may be hard to find later. What if the rules become too
long, and hard to maintain? Isn't it nicer to just have a single list of
entries where every valid entry appears exactly one time?
2. What if item types or item colors start to have more than just names? You
need entity tables for them unless you want to start duplicating even more
fields across the detail table. What if types have categories? What if
colors have price modifiers?


If you want to have another view at this theme:

We have build a small freeware tool to show the amount of memory the
tables and indexes physically needs:

Access Memory Reporter 1.0 : www.atroplan.com/AccMemReporter.htm

But notice: this tool counts the pages for each table/index, so you
will see no effect of changing the table structure, if you have only a
few records, you need mass records.
Regards

Thilo Immel
The Access Druid (Access Repair Service)
www.atroplan.com

Nov 13 '05 #7

This discussion thread is closed

Replies have been disabled for this discussion.