473,785 Members | 2,165 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Storage consumption

Hello,

For some very data-intensive projects it's interesting how much space the
DBMS uses for the storage of data, so I'm investigating how space
efficient different DBMSes are.

In the PostgreSQL manual, it's written that values of the type INTEGER
take op four bytes. I was curious about how close to real-World this
number is, so I did a test: How much space does PostgreSQL use when
storing 100000 rows where each row consists of a single INTEGER value?

With help from http://random.org/ I created a file with 100000 random
integer insertions. The SQL used to do that is available at
http://troels.arvin.dk/db/tests/stor...randomints.zip

About installation: PostgreSQL v. 7.3.4 on Red Hat Linux 9, file system
ext3. PostgreSQL data-area in /var/lib/pgsql/data.

For this test, PostgreSQL is being used for nothing else.

Before test start:
-----------------
Access to a default database ('psql' brings you right into a working
database) from psql.
Access to do a 'du' (disk usage unix-command) on /var/lib/pgsql/data from
the command line.
No existing table 'inttab' in database. PostgreSQL stopped.

Test starts.
-----------
Output of 'du -sb /var/lib/pgsql/data': 77946519.
Start PostgreSQL.
Do: "CREATE TABLE inttab (intval INT) WITHOUT OIDS;"
psql -q -f random_ints.sql
(Wait for a long time.)
Do: "VACUUM FULL;"
Shut down PostgreSQL.
Output of 'du -sb /var/lib/pgsql/data': 81190551.

Result:
------
Real difference: 81190551-77946519 = 3244032
Optimal difference: 100000*4 = 400000
Storage consumption rate ((real/optimal)*100)% = 811%

I'm surprised by an overhead _that_ high. Any comments on my methology?
Does it need adjustments? If you think it's rotten: What methology would
you use to measure space overhead for a DBMS? (Again: Space overhead is
seldomly interesting, but sometimes it is.)

I guess that transaction log files are a joker in this context, but then
again: A number which reflects the DBMS' disk usage before and after an
operation does have real-World meaning, I think.

(Of course, I'll need another methology for DBMSes which preallocate a
fixed amount of storage for a database.)

--
Greetings from Troels Arvin, Copenhagen, Denmark

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 12 '05 #1
2 2166

Did you see the FAQ item on estimating disk space?

---------------------------------------------------------------------------

Troels Arvin wrote:
Hello,

For some very data-intensive projects it's interesting how much space the
DBMS uses for the storage of data, so I'm investigating how space
efficient different DBMSes are.

In the PostgreSQL manual, it's written that values of the type INTEGER
take op four bytes. I was curious about how close to real-World this
number is, so I did a test: How much space does PostgreSQL use when
storing 100000 rows where each row consists of a single INTEGER value?

With help from http://random.org/ I created a file with 100000 random
integer insertions. The SQL used to do that is available at
http://troels.arvin.dk/db/tests/stor...randomints.zip

About installation: PostgreSQL v. 7.3.4 on Red Hat Linux 9, file system
ext3. PostgreSQL data-area in /var/lib/pgsql/data.

For this test, PostgreSQL is being used for nothing else.

Before test start:
-----------------
Access to a default database ('psql' brings you right into a working
database) from psql.
Access to do a 'du' (disk usage unix-command) on /var/lib/pgsql/data from
the command line.
No existing table 'inttab' in database. PostgreSQL stopped.

Test starts.
-----------
Output of 'du -sb /var/lib/pgsql/data': 77946519.
Start PostgreSQL.
Do: "CREATE TABLE inttab (intval INT) WITHOUT OIDS;"
psql -q -f random_ints.sql
(Wait for a long time.)
Do: "VACUUM FULL;"
Shut down PostgreSQL.
Output of 'du -sb /var/lib/pgsql/data': 81190551.

Result:
------
Real difference: 81190551-77946519 = 3244032
Optimal difference: 100000*4 = 400000
Storage consumption rate ((real/optimal)*100)% = 811%

I'm surprised by an overhead _that_ high. Any comments on my methology?
Does it need adjustments? If you think it's rotten: What methology would
you use to measure space overhead for a DBMS? (Again: Space overhead is
seldomly interesting, but sometimes it is.)

I guess that transaction log files are a joker in this context, but then
again: A number which reflects the DBMS' disk usage before and after an
operation does have real-World meaning, I think.

(Of course, I'll need another methology for DBMSes which preallocate a
fixed amount of storage for a database.)

--
Greetings from Troels Arvin, Copenhagen, Denmark

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html


--
Bruce Momjian | http://candle.pha.pa.us
pg***@candle.ph a.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postg resql.org so that your
message can get through to the mailing list cleanly

Nov 12 '05 #2


On Fri, 14 Nov 2003, Troels Arvin wrote:
Hello,

For some very data-intensive projects it's interesting how much space the
DBMS uses for the storage of data, so I'm investigating how space
efficient different DBMSes are.

In the PostgreSQL manual, it's written that values of the type INTEGER
take op four bytes. I was curious about how close to real-World this
number is, so I did a test: How much space does PostgreSQL use when
storing 100000 rows where each row consists of a single INTEGER value?


You are measuring the space used to store one row of one int column. To
test the space used just by one int column a more accurate test would be
to measure the difference in disk usage between a table with one int
column and a table with two int columns.

Kris Jurka
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to ma*******@postg resql.org)

Nov 12 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
1617
by: Wim Deprez | last post by:
Hi Group, for a project I am trying to find an OS independant way to measure the amount of cpu usage and memory consumption of the program. It would be nice if I could do that in my C++ code, so I can show the results at runtime or even plot a nice graph (I got wild dreams). After some google'ing I found some (nice) examples but they are all windows based. Does universal code for this problem exist?
3
1339
by: Pablo Wolfus | last post by:
I need to bulk insert very large amount of data into several MSSQL tables. The first Data model definition used identities to mantain relationship between those tables but we found that natural keys (compound) are better for bulk insert (there is no need to obtain the identity first) My question is, changing the identities to natural keys (in some tables in order of 4, 5 attributes) will enlarge my database storage?
0
1881
by: Namratha Shah \(Nasha\) | last post by:
Hey Group, After a long week end I am back again. Its nice and refreshing after a short vacation so lets get started with .NET once again. Today we will discuss about Isolated Storage. This is one of the topics which I find interesting as I feel that it has a lot of practical usage or applicability. We all know that all applications need some storage space to archive certain
10
1471
by: Marty | last post by:
Hi, Does anybody is experiencing a lot of RAM consumption when using many threads ? If yes, how can we reduce that level of used memory? Thanks tou! Marty
7
1731
by: Roman Petrichev | last post by:
Hi folks. I've just faced with very nasty memory consumption problem. I have a multythreaded app with 150 threads which use the only and the same function - through urllib2 it just gets the web page's html code and assigns it to local variable. On the next turn the variable is overritten with another page's code. At every moment the summary of values of the variables containig code is not more than 15Mb (I've just invented a tricky way to...
1
1941
by: liubin | last post by:
I have created a simple program using C#/.NET. It is very small program without explicit system resources request in the code. However, the memory consumption is even above 17M(shown in Windows Task Manager), much higher than 5M of the same one developed with MFC before. Now I wonder if there is any effective way to optimize .Net program achieving following two objectives: 1. Reduce system resources consumption of the program created...
2
1326
by: dattaforit | last post by:
Hello All, I have a Window service developed using Visual Studio .Net 2003. I am using VC++ .Net for the service development. The physical and virtual memory consumption is very high. How should i minimize this memory consumption. Please give me some clue to achieve this. Thanks
4
6506
by: =?Utf-8?B?SnVhbiBEZW50?= | last post by:
Hi, I am getting the following in a VC++ EXE (using VS2005) that links several C++ DLLs and uses MFC and ATL, when I try to start it under the debugger: ------- 'Exactus.UX.Studio.v1.exe': Loaded 'C:\WINDOWS\system32\advapi32.dll', No symbols loaded. 'Exactus.UX.Studio.v1.exe': Loaded 'C:\WINDOWS\system32\rpcrt4.dll', No symbols loaded.
2
2289
by: Jonas Maurus | last post by:
Hello everybody, I'm pondering the following problem: I want to write a Python program that receives messages via SMTP and stores them in a dict or an array. For my purposes it would be important that all received mail would be kept in RAM and not cached out to disk. If a new message comes in that can't fit in the allocated memory, a number of old messages would be discarded.
0
9647
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9489
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10356
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10100
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9959
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8988
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6744
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5528
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
3665
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.