473,624 Members | 2,026 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Slow dump with pg_dump/pg_restore ? How to improve ?

Hi,

some weeks ago I started to develop an application using Postgresql the
first time. I'm running 7.4.3 on a Linux box with a plain 2.6.7 kernel, the
storage is handled by 5 SATA disks, managed by a 3ware controller and using
a xfs filesystem. The DB server is a 3 Ghz P4 with 4 Gig of Ram, so the
machine is quite fast for most purposes I need it; the DB server is a pure
DB server, theres no application running on it.

I dump and restore it like this:

pg_dump -v ziptrader -o -F t > foo
pg_restore -v -c -F t -d ziptrader_expos t foo

Dumping needs to be done at least once a day (for copying as well as
archiving it), while restoring to two other databases will be done twice a
day. The dump is currently only 80 MB which I consider as very small (the
expected growth of the database will be 20-30 MB a day later). Currently
the database has just 6 tables whith 2 tables beeing responsible for 95% of
the database size.

So...the dump in the above format needs some 14 minutes, the restore 10
minutes. This seems to be very slow as it means something like 100K/sec for
dumping and restoring. The drive is cappable of 40 Meg/seconds, so thats
not the bottleneck ,-) Anyhow postmaster and pg_dump seem to max out the
CPU cycles as it's running at nearly 100% while dumping and restoring. But
I frighten the day when the database grows >1Gig as it would take then
hours to complete which is not acceptable to me.

My main memory settings are:

shared_buffers = 30000
sort_mem = 1000
vacuum_mem = 16384
What is the best starting point to look at? I'm sure I'm doing something
badly wrong but neither the docs nor some older mails from this group seem
to offer a hint to me.

Thanks a lot,
Soeren Gerlach

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postg resql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #1
6 11284
Soeren Gerlach wrote:

Dumping needs to be done at least once a day (for copying as well as
archiving it), while restoring to two other databases will be done twice a
day. The dump is currently only 80 MB which I consider as very small (the
expected growth of the database will be 20-30 MB a day later). Currently
the database has just 6 tables whith 2 tables beeing responsible for 95% of
the database size.

So...the dump in the above format needs some 14 minutes, the restore 10
minutes. This seems to be very slow as it means something like 100K/sec for
dumping and restoring. The drive is cappable of 40 Meg/seconds, so thats
not the bottleneck ,-) Anyhow postmaster and pg_dump seem to max out the
CPU cycles as it's running at nearly 100% while dumping and restoring. But
I frighten the day when the database grows >1Gig as it would take then
hours to complete which is not acceptable to me.


This is never going to work for you. Increasing the sort_mem during the
restore will probably help, but if it takes too long now then it
certainly won't cope when you have 10 times as much data.

The CPU is probably maxed as pg_restore rebuilds your indexes etc.
You'll find the disk-space occupied by the live system will be much more
than your current 80MB.

I think you want to look at some form of replication. There are a number
of options, the newest being Slony (http://www.slony.org/).

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 23 '05 #2
Soeren Gerlach <so****@all-about-shift.com> writes:
So...the dump in the above format needs some 14 minutes, the restore 10
minutes. This seems to be very slow as it means something like 100K/sec for
dumping and restoring. The drive is cappable of 40 Meg/seconds, so thats
not the bottleneck ,-) Anyhow postmaster and pg_dump seem to max out the
CPU cycles as it's running at nearly 100% while dumping and restoring.


What datatypes have you got in the large tables? Also, what character
set encoding are you using?

The only reason I can think of for dump to be that slow is if conversion
of the data to text is a big time sink. This would involve the
datatype's own output routine plus possibly a character set conversion.
You should at least make sure that no character set conversion needs to
happen (offhand I think this would only be an issue if pg_dump is
invoked with PGCLIENTENCODIN G set in its environment).

Also I trust you are using dump with the default COPY-style output,
not dump-as-INSERTs?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 23 '05 #3
So...the dump in the above format needs some 14 minutes, the restore 10
minutes. This seems to be very slow as it means something like 100K/sec
for dumping and restoring. The drive is cappable of 40 Meg/seconds, so
thats not the bottleneck ,-) Anyhow postmaster and pg_dump seem to max
out the CPU cycles as it's running at nearly 100% while dumping and
restoring.
What datatypes have you got in the large tables? Also, what character
set encoding are you using?


datestyle = 'ISO,European' ?
The only reason I can think of for dump to be that slow is if conversion
of the data to text is a big time sink. This would involve the
datatype's own output routine plus possibly a character set conversion.
You should at least make sure that no character set conversion needs to
happen (offhand I think this would only be an issue if pg_dump is
invoked with PGCLIENTENCODIN G set in its environment).
This are the schemas of the two big tables mentioned:

-------------------------------------------------------------------------
CREATE TABLE public.agent_ac tion_history
(
aah_tag_id bigserial NOT NULL,
aah_action_type int2 NOT NULL DEFAULT 0,
aah_quantity float4 NOT NULL DEFAULT 0,
aah_price float4 NOT NULL DEFAULT 0,
aah_sim_flag bool NOT NULL DEFAULT false,
aah_timestamp timestamp NOT NULL,
aah_action_pl float4 NOT NULL DEFAULT 0
) WITHOUT OIDS;

CREATE TABLE public.tick_his tory
(
tkh_id serial NOT NULL,
tkh_cdt_id int8 NOT NULL,
tkh_price float4 NOT NULL,
tkh_price_type int2 NOT NULL,
tkh_volume float4 NOT NULL,
tkh_system_time timestamp(6) NOT NULL,
tkh_exchange_ti me timestamp(6) NOT NULL,
CONSTRAINT "PK_tick_histor y" PRIMARY KEY (tkh_id)
) WITHOUT OIDS;
-------------------------------------------------------------------------
Also I trust you are using dump with the default COPY-style output,
not dump-as-INSERTs?


Yes. I'm dumping using "-c -Fc -v" as options. I just rerun the whole:

* Total rows: 904,000
* Dumping plain format: 21min == 1260 sec == 717 rows/sec
* Dumping "-c -Fc" format: 26min == 1560 sec == 580 rows/sec
* pg_dump takes 2/3 of the [single] CPU, postmaster the other 1/3 for both
dumps
* the resulting "-Fc" dump is 1/10 of the size of the plain dump

It would be interesting to know from people with DBs > 1 Gig how fast they
can dump in terms of rows/second.

Another info: I just noticed that I've 7.4.1 running not 7.4.3. So I'll
retest this tomorrow again and provide you with numbers for 7.4.3 too.
I'm likely to install Sybase as a reference to Postgres next week. But I can
remember to got there something like 5-10,000 rows/sec some one year ago
when I tested a 11.5 version for this issue too.

Thanks,
Soeren Gerlach
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postg resql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #4
Soeren Gerlach <so****@all-about-shift.com> writes:
* pg_dump takes 2/3 of the [single] CPU, postmaster the other 1/3 for both
dumps


Really!? Hmm, that seems fairly backwards ... thinks ...

In the -Fc case this makes some amount of sense because pg_dump runs
gzip-style compression on the data (which is why the output file is
so much smaller). But for plain text dump, pg_dump should be just
pushing the data straight through to stdout; it really ought not take
much CPU as far as I can see. There may be some simple performance
glitch involved there. Are you interested in recompiling with -pg
and getting a gprof profile of pg_dump?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 23 '05 #5
Soeren Gerlach <so****@all-about-shift.com> writes:
* pg_dump takes 2/3 of the [single] CPU, postmaster the other 1/3 for
both dumps


Really!? Hmm, that seems fairly backwards ... thinks ...

In the -Fc case this makes some amount of sense because pg_dump runs
gzip-style compression on the data (which is why the output file is
so much smaller). But for plain text dump, pg_dump should be just
pushing the data straight through to stdout; it really ought not take
much CPU as far as I can see. There may be some simple performance
glitch involved there. Are you interested in recompiling with -pg
and getting a gprof profile of pg_dump?


Yes I'am but I'm a little short on time ,-)) In fact I'm glad to d'l a
ready-to-run archive for Debian Woody. In two weeks I'll have some time to
check this issue with my own compiled versions, until then I'm just
interested ,-))
Today I upgraded to 7.4.3 from 7.4.1 but this did not change anything real.
Do you have numbers in respect to speed (rows per second) for comparison
available. I.e. dump on a single CPU machine which quite fast drives?
Regards,
Soeren
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to ma*******@postg resql.org)

Nov 23 '05 #6


Soeren Gerlach wrote:
Soeren Gerlach <so****@all-about-shift.com> writes:

* pg_dump takes 2/3 of the [single] CPU, postmaster the other 1/3 for
both dumps

Do you have numbers in respect to speed (rows per second) for comparison
available. I.e. dump on a single CPU machine which quite fast drives?
Regards,
Soeren

Hi Soeren,

I just used pg_dump on my database with your settings to give you a
reference point; however, my machine is a quad-Xeon, otherwise
comparable to yours. (2G ram, 15K SCSI) Top showed nominal CPU usage of
~20-25% during the run, so that agrees with your high cpu utilization.
But, the dump took 140 secs (elapsed time) for a 2.5GB (on disk) db. The
..dmp file was 650MB, and total rows was ~3M. So, that makes ~20000
rows/sec, which certainly suggests something is amiss with your hardware
or software or configuration or ...

Don

p.s. I am running PgSQL 7.4 on Red Hat 3.2.3 (Linux kernel 2.4.21-9 SMP)

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 23 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
4452
by: Mark Mikulec | last post by:
Hi there, I wonder if anyone can shed some light on a very frustrating problem. I'm running a debian linux 3.0 "woody" server, nothing special, with the latest version of postres that apt-get will allow me, which I *think* it;s 7.1 something, I don't know how to figure out the postgres version. Anywho - I'm trying to backup my databases, which I did at one point, but I have no idea what happened, could have been an upgrade. My Dbs...
7
6916
by: Howard Lowndes | last post by:
My situation is that I am interacting PHP 4.1.2 to PostgreSQL 7.2.2 I have no difficulty inserting and managing BLOBs into the Large Object system table, and I have a user table called images which maintains the relationship between the BLOB loid and the identity that relates to it in my user tables. So far so good. When I RTFM obout psql it refers to the \lo_import, \lo_list, \lo_export and \lo_unlink functions.
3
2369
by: Markus Rebbert | last post by:
Hi list, what happens with sequences if i dump a database and i want to reload it? Do i have to initialize the sequences like SELECT setval('sequence', max(id)) FROM table; ? Best regards,
2
2548
by: D. Dante Lorenso | last post by:
First I created a function that selected the next available pin code from a table of pre-defined pin codes: CREATE FUNCTION "public"."get_next_pin_code" () RETURNS varchar AS' DECLARE my_pin_code VARCHAR; BEGIN ... /* this is the pincode we just fetched */ RETURN (my_pin_code);
0
1849
by: Aienthiwan | last post by:
Hello all, I'm running a debian server and recently updated my version of PostgreSQL to "unstable", that being v7.4.1. I had no trouble at all getting everything up to date and going. But my auto-database backup scripts do not work any more: I get this error when I try to do a pg_dump with the new version of Postgres:
1
4493
by: Glen Parker | last post by:
Does anyone know of a way to get the output that pg_dump provides, but split in pieces? What I would like is one file with just the schema, one file with just the data, and one file with just indexes. To restore the data I would then: psql DB < schema.sql psql DB < data.sql psql DB < indexes.sql This would make large schema changes much easier since you wouldn't have to
0
1081
by: Josué Maldonado | last post by:
I did the pg_dump with the Linux 7.4.3 version and tried to pg_restore with Win32 8beta1 version and I got: C:\TEMP>pg_restore -d desarrollo -i -U postgres xdump pg_restore: input file does not appear to be a valid archive Taken from the help "For best results, however, try to use the pg_dumpall command from
0
2139
by: aktivo | last post by:
pg_dump remove name of schema from default value. i don't know why. how can i solve this problem ? For example: BEFORE PG_DUMP: CREATE TABLE "is8"."tb_kotuce" ( "id_kotuc" CHAR(10) DEFAULT is8.fn_sq_id_kotuc() NOT NULL, ... ...
2
7315
by: clearissues | last post by:
Hi All, I have a python script which takes dump of postgres and restores the same. When i take a dump there is no problem. command to take dump: pg_dump -b -c -C --format=c -d -h <<hostname>> -p 5432 -U postuser -f /tmp/april_23/abc.sql abc Restore command: pg_restore -c --format=c -h <<hostname>> -p 5432 -U postuser -d sfdb /tmp/april_23/abc.sql
0
8238
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8174
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8680
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8624
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8336
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
1
6111
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5565
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4176
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2607
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.