Help | Site Map
Connecting Tech Pros Worldwide
 
 
LinkBack Thread Tools
  #1  
Old November 12th, 2005, 12:40 AM
Troels Arvin
Guest
 
Posts: n/a
Default Storage consumption

Hello,

For some very data-intensive projects it's interesting how much space the
DBMS uses for the storage of data, so I'm investigating how space
efficient different DBMSes are.

In the PostgreSQL manual, it's written that values of the type INTEGER
take op four bytes. I was curious about how close to real-World this
number is, so I did a test: How much space does PostgreSQL use when
storing 100000 rows where each row consists of a single INTEGER value?

With help from http://random.org/ I created a file with 100000 random
integer insertions. The SQL used to do that is available at
http://troels.arvin.dk/db/tests/stor...randomints.zip

About installation: PostgreSQL v. 7.3.4 on Red Hat Linux 9, file system
ext3. PostgreSQL data-area in /var/lib/pgsql/data.

For this test, PostgreSQL is being used for nothing else.

Before test start:
-----------------
Access to a default database ('psql' brings you right into a working
database) from psql.
Access to do a 'du' (disk usage unix-command) on /var/lib/pgsql/data from
the command line.
No existing table 'inttab' in database. PostgreSQL stopped.

Test starts.
-----------
Output of 'du -sb /var/lib/pgsql/data': 77946519.
Start PostgreSQL.
Do: "CREATE TABLE inttab (intval INT) WITHOUT OIDS;"
psql -q -f random_ints.sql
(Wait for a long time.)
Do: "VACUUM FULL;"
Shut down PostgreSQL.
Output of 'du -sb /var/lib/pgsql/data': 81190551.

Result:
------
Real difference: 81190551-77946519 = 3244032
Optimal difference: 100000*4 = 400000
Storage consumption rate ((real/optimal)*100)% = 811%

I'm surprised by an overhead _that_ high. Any comments on my methology?
Does it need adjustments? If you think it's rotten: What methology would
you use to measure space overhead for a DBMS? (Again: Space overhead is
seldomly interesting, but sometimes it is.)

I guess that transaction log files are a joker in this context, but then
again: A number which reflects the DBMS' disk usage before and after an
operation does have real-World meaning, I think.

(Of course, I'll need another methology for DBMSes which preallocate a
fixed amount of storage for a database.)

--
Greetings from Troels Arvin, Copenhagen, Denmark



---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

  #2  
Old November 12th, 2005, 12:40 AM
Bruce Momjian
Guest
 
Posts: n/a
Default Re: Storage consumption


Did you see the FAQ item on estimating disk space?

---------------------------------------------------------------------------

Troels Arvin wrote:[color=blue]
> Hello,
>
> For some very data-intensive projects it's interesting how much space the
> DBMS uses for the storage of data, so I'm investigating how space
> efficient different DBMSes are.
>
> In the PostgreSQL manual, it's written that values of the type INTEGER
> take op four bytes. I was curious about how close to real-World this
> number is, so I did a test: How much space does PostgreSQL use when
> storing 100000 rows where each row consists of a single INTEGER value?
>
> With help from http://random.org/ I created a file with 100000 random
> integer insertions. The SQL used to do that is available at
> http://troels.arvin.dk/db/tests/stor...randomints.zip
>
> About installation: PostgreSQL v. 7.3.4 on Red Hat Linux 9, file system
> ext3. PostgreSQL data-area in /var/lib/pgsql/data.
>
> For this test, PostgreSQL is being used for nothing else.
>
> Before test start:
> -----------------
> Access to a default database ('psql' brings you right into a working
> database) from psql.
> Access to do a 'du' (disk usage unix-command) on /var/lib/pgsql/data from
> the command line.
> No existing table 'inttab' in database. PostgreSQL stopped.
>
> Test starts.
> -----------
> Output of 'du -sb /var/lib/pgsql/data': 77946519.
> Start PostgreSQL.
> Do: "CREATE TABLE inttab (intval INT) WITHOUT OIDS;"
> psql -q -f random_ints.sql
> (Wait for a long time.)
> Do: "VACUUM FULL;"
> Shut down PostgreSQL.
> Output of 'du -sb /var/lib/pgsql/data': 81190551.
>
> Result:
> ------
> Real difference: 81190551-77946519 = 3244032
> Optimal difference: 100000*4 = 400000
> Storage consumption rate ((real/optimal)*100)% = 811%
>
> I'm surprised by an overhead _that_ high. Any comments on my methology?
> Does it need adjustments? If you think it's rotten: What methology would
> you use to measure space overhead for a DBMS? (Again: Space overhead is
> seldomly interesting, but sometimes it is.)
>
> I guess that transaction log files are a joker in this context, but then
> again: A number which reflects the DBMS' disk usage before and after an
> operation does have real-World meaning, I think.
>
> (Of course, I'll need another methology for DBMSes which preallocate a
> fixed amount of storage for a database.)
>
> --
> Greetings from Troels Arvin, Copenhagen, Denmark
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html
>[/color]

--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

  #3  
Old November 12th, 2005, 12:41 AM
Kris Jurka
Guest
 
Posts: n/a
Default Re: Storage consumption



On Fri, 14 Nov 2003, Troels Arvin wrote:
[color=blue]
> Hello,
>
> For some very data-intensive projects it's interesting how much space the
> DBMS uses for the storage of data, so I'm investigating how space
> efficient different DBMSes are.
>
> In the PostgreSQL manual, it's written that values of the type INTEGER
> take op four bytes. I was curious about how close to real-World this
> number is, so I did a test: How much space does PostgreSQL use when
> storing 100000 rows where each row consists of a single INTEGER value?
>[/color]

You are measuring the space used to store one row of one int column. To
test the space used just by one int column a more accurate test would be
to measure the difference in disk usage between a table with one int
column and a table with two int columns.

Kris Jurka


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

 

Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over network members.
Post your question now . . .
It's fast and it's free

Popular Articles