472,971 Members | 1,921 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,971 software developers and data experts.

Basic question on database size - does an empty row take up space?

DH
I have a VERY basic question about figuring database size. I've inherited a
database which is generally similar to this basic one:

Item, Red, Blue, Green, Yellow
(text), (int),(int),(int),(int)
box, 1,0,0,2
hat, 0,0,0,1
car, 3,0,0,0

This format leads to a lot of zeros in the rows which take up a lot of
space. Say I change it to this format:

Item, Color, Number
(text), (text), (int)
box, Red, 1
box, Yellow, 2
hat, Yellow, 1
car, Red, 3

There will be more rows this way (since there needs to be an extra row for
the 2nd box color) but not as many columns.

My question is how much space does this save. Each int number takes up 4
bytes, right? My question is how much space the row itself takes up. Another
way to ask is how much space, if any, does the row itself take up in the
database?

Thanks.
Nov 13 '05 #1
6 5063
On Mon, 12 Jul 2004 20:18:04 -0700, "DH" <dana_hartley@(remove)hotmail.com>
wrote:
I have a VERY basic question about figuring database size. I've inherited a
database which is generally similar to this basic one:

Item, Red, Blue, Green, Yellow
(text), (int),(int),(int),(int)
box, 1,0,0,2
hat, 0,0,0,1
car, 3,0,0,0

This format leads to a lot of zeros in the rows which take up a lot of
space. Say I change it to this format:

Item, Color, Number
(text), (text), (int)
box, Red, 1
box, Yellow, 2
hat, Yellow, 1
car, Red, 3

There will be more rows this way (since there needs to be an extra row for
the 2nd box color) but not as many columns.

My question is how much space does this save. Each int number takes up 4
bytes, right? My question is how much space the row itself takes up. Another
way to ask is how much space, if any, does the row itself take up in the
database?

Thanks.


Those of us who have been working with databases for a long time have pretty
much ceased worrying about the space issue very much. The actual space
equation is calculable, but depends on many factors including how many zeroes
vs non-seroes you need to represent. Furthermore, you should really normalize
that one more step since there are only 3 colors, and no reason to repeat the
text of the color name on each row.

The real issue to be concerned about here is what data representation makes
the most sense in terms of how it's used. In the case of an RGB color system,
the argument could probably be made that there are always exactly 3 color
components, and that is unlikely to change, so the separate Red, Green, and
Blue fields make sense.

You don't need a structure that allows more than 3 components, and
furthermore, what is the meaning of lacking a component? You're suggesting
that it should mean zero, but that's hardly obvious by the schema. The lack
of a record or a Null generally means there is no meaningful data, not that
there is a meaningful zero value. If you make this explicit by always having
color rows for all 3 components, then you aren't saving anything.
Nov 13 '05 #2
DH
> Those of us who have been working with databases for a long time have
pretty
much ceased worrying about the space issue very much. The actual space
equation is calculable, but depends on many factors including how many zeroes vs non-seroes you need to represent. Furthermore, you should really normalize that one more step since there are only 3 colors, and no reason to repeat the text of the color name on each row.

The real issue to be concerned about here is what data representation makes the most sense in terms of how it's used. In the case of an RGB color system, the argument could probably be made that there are always exactly 3 color
components, and that is unlikely to change, so the separate Red, Green, and Blue fields make sense.

You don't need a structure that allows more than 3 components, and
furthermore, what is the meaning of lacking a component? You're suggesting that it should mean zero, but that's hardly obvious by the schema. The lack of a record or a Null generally means there is no meaningful data, not that there is a meaningful zero value. If you make this explicit by always having color rows for all 3 components, then you aren't saving anything.


My example database might have been a little unclear. Think of my example as
an inventory type system for the color of items. An item can only have one
color. So in my example row 1, there are 3 boxes total (2 yellow, 1 Red).
There are only the 4 colors possible (items can't have more than one color)
and this will never change. I'm not quite sure I understand what you are
trying to say above. The way the database is setup now (with a column for
each color), there is a zero value in the 'empty' spaces. The reason the
person used zeros were used instead of nulls had something to do with making
math calculations easier. I'm not sure why that is but that is what they
said when I asked about it. If I changed it to have color rows, there
wouldn't be rows for the zero columns.


Nov 13 '05 #3
On Mon, 12 Jul 2004 21:53:50 -0700, "DH" <dana_hartley@(remove)hotmail.com>
wrote:
Those of us who have been working with databases for a long time have

pretty
much ceased worrying about the space issue very much. The actual space
equation is calculable, but depends on many factors including how many

zeroes
vs non-seroes you need to represent. Furthermore, you should really

normalize
that one more step since there are only 3 colors, and no reason to repeat

the
text of the color name on each row.

The real issue to be concerned about here is what data representation

makes
the most sense in terms of how it's used. In the case of an RGB color

system,
the argument could probably be made that there are always exactly 3 color
components, and that is unlikely to change, so the separate Red, Green,

and
Blue fields make sense.

You don't need a structure that allows more than 3 components, and
furthermore, what is the meaning of lacking a component? You're

suggesting
that it should mean zero, but that's hardly obvious by the schema. The

lack
of a record or a Null generally means there is no meaningful data, not

that
there is a meaningful zero value. If you make this explicit by always

having
color rows for all 3 components, then you aren't saving anything.


My example database might have been a little unclear. Think of my example as
an inventory type system for the color of items. An item can only have one
color. So in my example row 1, there are 3 boxes total (2 yellow, 1 Red).
There are only the 4 colors possible (items can't have more than one color)
and this will never change. I'm not quite sure I understand what you are
trying to say above. The way the database is setup now (with a column for
each color), there is a zero value in the 'empty' spaces. The reason the
person used zeros were used instead of nulls had something to do with making
math calculations easier. I'm not sure why that is but that is what they
said when I asked about it. If I changed it to have color rows, there
wouldn't be rows for the zero columns.


Sorry, I was reading too fast, and did not correctly interpret what you were
trying to do. Actually, your assessmet of how to normalize the database was
pretty much on track. It's better normalization to change your design to have
a related detail row for the quantity at each color as you suggest. If you
don't do that, you must change the code if you ever add or remove a color
choice, so the design original is inflexible.

The next step would be to move the color choices to their own table, and use a
surrogate key to point from the color/quantity detail table to the color
table.

tblItemType
=====
*ItemTypeID Autonumber Primary Key
ItemTypeName Text (100) Index - no duplicates

tblItemColor
=====
*ItemColorID Autonumber Primary Key
ItemColorName Text (40) Index - no duplicates

tblItemCount
=====
*ItemTypeID Long Int 1st field of PK
*ItemColorID Long Int 2nd field of PK
ItemQty Long Int

In this arrangement, we say that tblItemCount is a junction table that defines
a many-to-many relationship between item types and item colors. Because the
combination of ItemColorID and ItemTypeID must be unique, there can be only
one record in tblItemCount for each combination, and that record is the single
place that the quantity for that combination is recorded.

Item color name and item type name are each stored exactly once, in one place,
and each can be changed at will without breaking any relational rules. This
means, if you wanted to change Green to Veridian, that's fine. Likewise, if
you need to correct the spelling of "Automoble", that's fine too.

Nov 13 '05 #4
DH
> Sorry, I was reading too fast, and did not correctly interpret what you
were
trying to do. Actually, your assessmet of how to normalize the database was pretty much on track. It's better normalization to change your design to have a related detail row for the quantity at each color as you suggest. If you don't do that, you must change the code if you ever add or remove a color
choice, so the design original is inflexible.

Good point.
The next step would be to move the color choices to their own table, and use a surrogate key to point from the color/quantity detail table to the color
table.

tblItemType
=====
*ItemTypeID Autonumber Primary Key
ItemTypeName Text (100) Index - no duplicates

tblItemColor
=====
*ItemColorID Autonumber Primary Key
ItemColorName Text (40) Index - no duplicates

tblItemCount
=====
*ItemTypeID Long Int 1st field of PK
*ItemColorID Long Int 2nd field of PK
ItemQty Long Int

In this arrangement, we say that tblItemCount is a junction table that defines a many-to-many relationship between item types and item colors. Because the combination of ItemColorID and ItemTypeID must be unique, there can be only one record in tblItemCount for each combination, and that record is the single place that the quantity for that combination is recorded.

Item color name and item type name are each stored exactly once, in one place, and each can be changed at will without breaking any relational rules. This means, if you wanted to change Green to Veridian, that's fine. Likewise, if you need to correct the spelling of "Automoble", that's fine too.


That makes sense and thanks for your help. However, I'm a little unclear on
what advantage there is in the ItemColor and ItemType tables aside from
being able to add/change colors later. In my case I am 100% certain that the
colors will never change. That being said, is there still an advantage to
the extra table?

And for the Items I don't see the advantage to the ItemType table except for
changing a name like you state. But, isn't this just as easy to do with an
update query on the table to make the change to all occurances of the
misspelled name? It seems to me like the extra tables are adding complexity
without much benefit. What am I missing?

Thanks again.
Nov 13 '05 #5
On Tue, 13 Jul 2004 00:16:39 -0700, "DH" <dana_hartley@(remove)hotmail.com>
wrote:
Sorry, I was reading too fast, and did not correctly interpret what you

were
trying to do. Actually, your assessmet of how to normalize the database

was
pretty much on track. It's better normalization to change your design to

have
a related detail row for the quantity at each color as you suggest. If

you
don't do that, you must change the code if you ever add or remove a color
choice, so the design original is inflexible.


Good point.
The next step would be to move the color choices to their own table, and

use a
surrogate key to point from the color/quantity detail table to the color
table.

tblItemType
=====
*ItemTypeID Autonumber Primary Key
ItemTypeName Text (100) Index - no duplicates

tblItemColor
=====
*ItemColorID Autonumber Primary Key
ItemColorName Text (40) Index - no duplicates

tblItemCount
=====
*ItemTypeID Long Int 1st field of PK
*ItemColorID Long Int 2nd field of PK
ItemQty Long Int

In this arrangement, we say that tblItemCount is a junction table that

defines
a many-to-many relationship between item types and item colors. Because

the
combination of ItemColorID and ItemTypeID must be unique, there can be

only
one record in tblItemCount for each combination, and that record is the

single
place that the quantity for that combination is recorded.

Item color name and item type name are each stored exactly once, in one

place,
and each can be changed at will without breaking any relational rules.

This
means, if you wanted to change Green to Veridian, that's fine. Likewise,

if
you need to correct the spelling of "Automoble", that's fine too.


That makes sense and thanks for your help. However, I'm a little unclear on
what advantage there is in the ItemColor and ItemType tables aside from
being able to add/change colors later. In my case I am 100% certain that the
colors will never change. That being said, is there still an advantage to
the extra table?

And for the Items I don't see the advantage to the ItemType table except for
changing a name like you state. But, isn't this just as easy to do with an
update query on the table to make the change to all occurances of the
misspelled name? It seems to me like the extra tables are adding complexity
without much benefit. What am I missing?

Thanks again.


1. By eliminating duplication, you eliminate the potential for all kinds of
problems or the creation of even more complexity to solve the problems. If,
for instance, you want to prevent bad data from getting into the system, how
do you do it? Do you create data entry rules? If so, you have information
hidden in rules that may be hard to find later. What if the rules become too
long, and hard to maintain? Isn't it nicer to just have a single list of
entries where every valid entry appears exactly one time?
2. What if item types or item colors start to have more than just names? You
need entity tables for them unless you want to start duplicating even more
fields across the detail table. What if types have categories? What if
colors have price modifiers?
Nov 13 '05 #6
Hi Steve, hi Dana,
On Tue, 13 Jul 2004 00:16:39 -0700, "DH" <dana_hartley@(remove)hotmail.com>
wrote:
Sorry, I was reading too fast, and did not correctly interpret what you

were
trying to do. Actually, your assessmet of how to normalize the database

was
pretty much on track. It's better normalization to change your design to

have
a related detail row for the quantity at each color as you suggest. If

you
don't do that, you must change the code if you ever add or remove a color
choice, so the design original is inflexible.


Good point.
The next step would be to move the color choices to their own table, and

use a
surrogate key to point from the color/quantity detail table to the color
table.

tblItemType
=====
*ItemTypeID Autonumber Primary Key
ItemTypeName Text (100) Index - no duplicates

tblItemColor
=====
*ItemColorID Autonumber Primary Key
ItemColorName Text (40) Index - no duplicates

tblItemCount
=====
*ItemTypeID Long Int 1st field of PK
*ItemColorID Long Int 2nd field of PK
ItemQty Long Int

In this arrangement, we say that tblItemCount is a junction table that

defines
a many-to-many relationship between item types and item colors. Because

the
combination of ItemColorID and ItemTypeID must be unique, there can be

only
one record in tblItemCount for each combination, and that record is the

single
place that the quantity for that combination is recorded.

Item color name and item type name are each stored exactly once, in one

place,
and each can be changed at will without breaking any relational rules.

This
means, if you wanted to change Green to Veridian, that's fine. Likewise,

if
you need to correct the spelling of "Automoble", that's fine too.


That makes sense and thanks for your help. However, I'm a little unclear on
what advantage there is in the ItemColor and ItemType tables aside from
being able to add/change colors later. In my case I am 100% certain that the
colors will never change. That being said, is there still an advantage to
the extra table?

And for the Items I don't see the advantage to the ItemType table except for
changing a name like you state. But, isn't this just as easy to do with an
update query on the table to make the change to all occurances of the
misspelled name? It seems to me like the extra tables are adding complexity
without much benefit. What am I missing?

Thanks again.


1. By eliminating duplication, you eliminate the potential for all kinds of
problems or the creation of even more complexity to solve the problems. If,
for instance, you want to prevent bad data from getting into the system, how
do you do it? Do you create data entry rules? If so, you have information
hidden in rules that may be hard to find later. What if the rules become too
long, and hard to maintain? Isn't it nicer to just have a single list of
entries where every valid entry appears exactly one time?
2. What if item types or item colors start to have more than just names? You
need entity tables for them unless you want to start duplicating even more
fields across the detail table. What if types have categories? What if
colors have price modifiers?


If you want to have another view at this theme:

We have build a small freeware tool to show the amount of memory the
tables and indexes physically needs:

Access Memory Reporter 1.0 : www.atroplan.com/AccMemReporter.htm

But notice: this tool counts the pages for each table/index, so you
will see no effect of changing the table structure, if you have only a
few records, you need mass records.
Regards

Thilo Immel
The Access Druid (Access Repair Service)
www.atroplan.com

Nov 13 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: pescott | last post by:
Hello people, I could really use your help on this one... I have script to upload image data to a database, but all that is uploaded is the decription, not the data BLOB itself. There must be...
1
by: Beyonder | last post by:
I'm using InnoDB with Mysql and I've used delete to delete over 10 million records, but the InnoDB database is still the same size on the disk (over 100 gigs) is there a way to get Mysql/InnoDB to...
19
by: dchow | last post by:
Our database size is currently 4G and is incrementing at a rate of 45M/day. What is the max size of a SQL database? And what is the size beyond which the server performance will start to go down?
2
by: Marc C | last post by:
Hello, I am trying to clean up a database I inherited. I have an 80GB SQL 2000 database with 20 datafiles each 4096 in size. I have been able to remove unneeded data and am now trying to clean...
3
by: ThazKool | last post by:
Is there anyway to write a class or struct that has no storage. It only operates on a reference to an already existing type. This is actually an extension to another thread. The thread went a...
2
by: Kums | last post by:
What is the maximum permissible size of a database? Is there any limitation. What is the maximum # of tablespace's allowed in a database? Thanks for your response.
5
by: Clodoaldo Pinto Neto | last post by:
I suspect there is something wrong because it takes 73s to delete a single line from a table whith 140 lines. So I tried to reindex the database: bash-2.05b$ postgres -P -O -D /var/lib/pgsql/data...
7
by: Joey | last post by:
I don't care what .net language this come in but I really need to determine if a disk is dynamic or basic. I have posted something in the WMI group but no one knows how to do it. Does anyone have...
6
Atli
by: Atli | last post by:
This is an easy to digest 12 step guide on basics of using MySQL. It's a great refresher for those who need it and it work's great for first time MySQL users. Anyone should be able to get...
0
by: lllomh | last post by:
Define the method first this.state = { buttonBackgroundColor: 'green', isBlinking: false, // A new status is added to identify whether the button is blinking or not } autoStart=()=>{
2
by: DJRhino | last post by:
Was curious if anyone else was having this same issue or not.... I was just Up/Down graded to windows 11 and now my access combo boxes are not acting right. With win 10 I could start typing...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 4 Oct 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
0
by: Aliciasmith | last post by:
In an age dominated by smartphones, having a mobile app for your business is no longer an option; it's a necessity. Whether you're a startup or an established enterprise, finding the right mobile app...
0
tracyyun
by: tracyyun | last post by:
Hello everyone, I have a question and would like some advice on network connectivity. I have one computer connected to my router via WiFi, but I have two other computers that I want to be able to...
4
NeoPa
by: NeoPa | last post by:
Hello everyone. I find myself stuck trying to find the VBA way to get Access to create a PDF of the currently-selected (and open) object (Form or Report). I know it can be done by selecting :...
1
by: Teri B | last post by:
Hi, I have created a sub-form Roles. In my course form the user selects the roles assigned to the course. 0ne-to-many. One course many roles. Then I created a report based on the Course form and...
3
by: nia12 | last post by:
Hi there, I am very new to Access so apologies if any of this is obvious/not clear. I am creating a data collection tool for health care employees to complete. It consists of a number of...
0
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be focusing on the Report (clsReport) class. This simply handles making the calling Form invisible until all of the Reports opened by it have been closed, when it...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.