By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
434,795 Members | 1,251 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 434,795 IT Pros & Developers. It's quick & easy.

Slow dump with pg_dump/pg_restore ? How to improve ?

P: n/a
Hi,

some weeks ago I started to develop an application using Postgresql the
first time. I'm running 7.4.3 on a Linux box with a plain 2.6.7 kernel, the
storage is handled by 5 SATA disks, managed by a 3ware controller and using
a xfs filesystem. The DB server is a 3 Ghz P4 with 4 Gig of Ram, so the
machine is quite fast for most purposes I need it; the DB server is a pure
DB server, theres no application running on it.

I dump and restore it like this:

pg_dump -v ziptrader -o -F t > foo
pg_restore -v -c -F t -d ziptrader_expost foo

Dumping needs to be done at least once a day (for copying as well as
archiving it), while restoring to two other databases will be done twice a
day. The dump is currently only 80 MB which I consider as very small (the
expected growth of the database will be 20-30 MB a day later). Currently
the database has just 6 tables whith 2 tables beeing responsible for 95% of
the database size.

So...the dump in the above format needs some 14 minutes, the restore 10
minutes. This seems to be very slow as it means something like 100K/sec for
dumping and restoring. The drive is cappable of 40 Meg/seconds, so thats
not the bottleneck ,-) Anyhow postmaster and pg_dump seem to max out the
CPU cycles as it's running at nearly 100% while dumping and restoring. But
I frighten the day when the database grows >1Gig as it would take then
hours to complete which is not acceptable to me.

My main memory settings are:

shared_buffers = 30000
sort_mem = 1000
vacuum_mem = 16384
What is the best starting point to look at? I'm sure I'm doing something
badly wrong but neither the docs nor some older mails from this group seem
to offer a hint to me.

Thanks a lot,
Soeren Gerlach

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #1
Share this Question
Share on Google+
6 Replies


P: n/a
Soeren Gerlach wrote:

Dumping needs to be done at least once a day (for copying as well as
archiving it), while restoring to two other databases will be done twice a
day. The dump is currently only 80 MB which I consider as very small (the
expected growth of the database will be 20-30 MB a day later). Currently
the database has just 6 tables whith 2 tables beeing responsible for 95% of
the database size.

So...the dump in the above format needs some 14 minutes, the restore 10
minutes. This seems to be very slow as it means something like 100K/sec for
dumping and restoring. The drive is cappable of 40 Meg/seconds, so thats
not the bottleneck ,-) Anyhow postmaster and pg_dump seem to max out the
CPU cycles as it's running at nearly 100% while dumping and restoring. But
I frighten the day when the database grows >1Gig as it would take then
hours to complete which is not acceptable to me.


This is never going to work for you. Increasing the sort_mem during the
restore will probably help, but if it takes too long now then it
certainly won't cope when you have 10 times as much data.

The CPU is probably maxed as pg_restore rebuilds your indexes etc.
You'll find the disk-space occupied by the live system will be much more
than your current 80MB.

I think you want to look at some form of replication. There are a number
of options, the newest being Slony (http://www.slony.org/).

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 23 '05 #2

P: n/a
Soeren Gerlach <so****@all-about-shift.com> writes:
So...the dump in the above format needs some 14 minutes, the restore 10
minutes. This seems to be very slow as it means something like 100K/sec for
dumping and restoring. The drive is cappable of 40 Meg/seconds, so thats
not the bottleneck ,-) Anyhow postmaster and pg_dump seem to max out the
CPU cycles as it's running at nearly 100% while dumping and restoring.


What datatypes have you got in the large tables? Also, what character
set encoding are you using?

The only reason I can think of for dump to be that slow is if conversion
of the data to text is a big time sink. This would involve the
datatype's own output routine plus possibly a character set conversion.
You should at least make sure that no character set conversion needs to
happen (offhand I think this would only be an issue if pg_dump is
invoked with PGCLIENTENCODING set in its environment).

Also I trust you are using dump with the default COPY-style output,
not dump-as-INSERTs?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 23 '05 #3

P: n/a
So...the dump in the above format needs some 14 minutes, the restore 10
minutes. This seems to be very slow as it means something like 100K/sec
for dumping and restoring. The drive is cappable of 40 Meg/seconds, so
thats not the bottleneck ,-) Anyhow postmaster and pg_dump seem to max
out the CPU cycles as it's running at nearly 100% while dumping and
restoring.
What datatypes have you got in the large tables? Also, what character
set encoding are you using?


datestyle = 'ISO,European' ?
The only reason I can think of for dump to be that slow is if conversion
of the data to text is a big time sink. This would involve the
datatype's own output routine plus possibly a character set conversion.
You should at least make sure that no character set conversion needs to
happen (offhand I think this would only be an issue if pg_dump is
invoked with PGCLIENTENCODING set in its environment).
This are the schemas of the two big tables mentioned:

-------------------------------------------------------------------------
CREATE TABLE public.agent_action_history
(
aah_tag_id bigserial NOT NULL,
aah_action_type int2 NOT NULL DEFAULT 0,
aah_quantity float4 NOT NULL DEFAULT 0,
aah_price float4 NOT NULL DEFAULT 0,
aah_sim_flag bool NOT NULL DEFAULT false,
aah_timestamp timestamp NOT NULL,
aah_action_pl float4 NOT NULL DEFAULT 0
) WITHOUT OIDS;

CREATE TABLE public.tick_history
(
tkh_id serial NOT NULL,
tkh_cdt_id int8 NOT NULL,
tkh_price float4 NOT NULL,
tkh_price_type int2 NOT NULL,
tkh_volume float4 NOT NULL,
tkh_system_time timestamp(6) NOT NULL,
tkh_exchange_time timestamp(6) NOT NULL,
CONSTRAINT "PK_tick_history" PRIMARY KEY (tkh_id)
) WITHOUT OIDS;
-------------------------------------------------------------------------
Also I trust you are using dump with the default COPY-style output,
not dump-as-INSERTs?


Yes. I'm dumping using "-c -Fc -v" as options. I just rerun the whole:

* Total rows: 904,000
* Dumping plain format: 21min == 1260 sec == 717 rows/sec
* Dumping "-c -Fc" format: 26min == 1560 sec == 580 rows/sec
* pg_dump takes 2/3 of the [single] CPU, postmaster the other 1/3 for both
dumps
* the resulting "-Fc" dump is 1/10 of the size of the plain dump

It would be interesting to know from people with DBs > 1 Gig how fast they
can dump in terms of rows/second.

Another info: I just noticed that I've 7.4.1 running not 7.4.3. So I'll
retest this tomorrow again and provide you with numbers for 7.4.3 too.
I'm likely to install Sybase as a reference to Postgres next week. But I can
remember to got there something like 5-10,000 rows/sec some one year ago
when I tested a 11.5 version for this issue too.

Thanks,
Soeren Gerlach
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #4

P: n/a
Soeren Gerlach <so****@all-about-shift.com> writes:
* pg_dump takes 2/3 of the [single] CPU, postmaster the other 1/3 for both
dumps


Really!? Hmm, that seems fairly backwards ... thinks ...

In the -Fc case this makes some amount of sense because pg_dump runs
gzip-style compression on the data (which is why the output file is
so much smaller). But for plain text dump, pg_dump should be just
pushing the data straight through to stdout; it really ought not take
much CPU as far as I can see. There may be some simple performance
glitch involved there. Are you interested in recompiling with -pg
and getting a gprof profile of pg_dump?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 23 '05 #5

P: n/a
Soeren Gerlach <so****@all-about-shift.com> writes:
* pg_dump takes 2/3 of the [single] CPU, postmaster the other 1/3 for
both dumps


Really!? Hmm, that seems fairly backwards ... thinks ...

In the -Fc case this makes some amount of sense because pg_dump runs
gzip-style compression on the data (which is why the output file is
so much smaller). But for plain text dump, pg_dump should be just
pushing the data straight through to stdout; it really ought not take
much CPU as far as I can see. There may be some simple performance
glitch involved there. Are you interested in recompiling with -pg
and getting a gprof profile of pg_dump?


Yes I'am but I'm a little short on time ,-)) In fact I'm glad to d'l a
ready-to-run archive for Debian Woody. In two weeks I'll have some time to
check this issue with my own compiled versions, until then I'm just
interested ,-))
Today I upgraded to 7.4.3 from 7.4.1 but this did not change anything real.
Do you have numbers in respect to speed (rows per second) for comparison
available. I.e. dump on a single CPU machine which quite fast drives?
Regards,
Soeren
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #6

P: n/a


Soeren Gerlach wrote:
Soeren Gerlach <so****@all-about-shift.com> writes:

* pg_dump takes 2/3 of the [single] CPU, postmaster the other 1/3 for
both dumps

Do you have numbers in respect to speed (rows per second) for comparison
available. I.e. dump on a single CPU machine which quite fast drives?
Regards,
Soeren

Hi Soeren,

I just used pg_dump on my database with your settings to give you a
reference point; however, my machine is a quad-Xeon, otherwise
comparable to yours. (2G ram, 15K SCSI) Top showed nominal CPU usage of
~20-25% during the run, so that agrees with your high cpu utilization.
But, the dump took 140 secs (elapsed time) for a 2.5GB (on disk) db. The
..dmp file was 650MB, and total rows was ~3M. So, that makes ~20000
rows/sec, which certainly suggests something is amiss with your hardware
or software or configuration or ...

Don

p.s. I am running PgSQL 7.4 on Red Hat 3.2.3 (Linux kernel 2.4.21-9 SMP)

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 23 '05 #7

This discussion thread is closed

Replies have been disabled for this discussion.