Slow dump with pg_dump/pg_restore ? How to improve ?

Soeren Gerlach

Hi,

some weeks ago I started to develop an application using Postgresql the
first time. I'm running 7.4.3 on a Linux box with a plain 2.6.7 kernel, the
storage is handled by 5 SATA disks, managed by a 3ware controller and using
a xfs filesystem. The DB server is a 3 Ghz P4 with 4 Gig of Ram, so the
machine is quite fast for most purposes I need it; the DB server is a pure
DB server, theres no application running on it.

I dump and restore it like this:

pg_dump -v ziptrader -o -F t > foo
pg_restore -v -c -F t -d ziptrader_expost foo

Dumping needs to be done at least once a day (for copying as well as
archiving it), while restoring to two other databases will be done twice a
day. The dump is currently only 80 MB which I consider as very small (the
expected growth of the database will be 20-30 MB a day later). Currently
the database has just 6 tables whith 2 tables beeing responsible for 95% of
the database size.

So...the dump in the above format needs some 14 minutes, the restore 10
minutes. This seems to be very slow as it means something like 100K/sec for
dumping and restoring. The drive is cappable of 40 Meg/seconds, so thats
not the bottleneck ,-) Anyhow postmaster and pg_dump seem to max out the
CPU cycles as it's running at nearly 100% while dumping and restoring. But
I frighten the day when the database grows >1Gig as it would take then
hours to complete which is not acceptable to me.

My main memory settings are:

shared_buffers = 30000
sort_mem = 1000
vacuum_mem = 16384
What is the best starting point to look at? I'm sure I'm doing something
badly wrong but neither the docs nor some older mails from this group seem
to offer a hint to me.

Thanks a lot,
Soeren Gerlach

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #1

Subscribe Post Reply

11248

Richard Huxton

Soeren Gerlach wrote:

Dumping needs to be done at least once a day (for copying as well as
archiving it), while restoring to two other databases will be done twice a
day. The dump is currently only 80 MB which I consider as very small (the
expected growth of the database will be 20-30 MB a day later). Currently
the database has just 6 tables whith 2 tables beeing responsible for 95% of
the database size.

So...the dump in the above format needs some 14 minutes, the restore 10
minutes. This seems to be very slow as it means something like 100K/sec for
dumping and restoring. The drive is cappable of 40 Meg/seconds, so thats
not the bottleneck ,-) Anyhow postmaster and pg_dump seem to max out the
CPU cycles as it's running at nearly 100% while dumping and restoring. But
I frighten the day when the database grows >1Gig as it would take then
hours to complete which is not acceptable to me.

This is never going to work for you. Increasing the sort_mem during the
restore will probably help, but if it takes too long now then it
certainly won't cope when you have 10 times as much data.

The CPU is probably maxed as pg_restore rebuilds your indexes etc.
You'll find the disk-space occupied by the live system will be much more
than your current 80MB.

I think you want to look at some form of replication. There are a number
of options, the newest being Slony (http://www.slony.org/).

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 23 '05 #2

Tom Lane

Soeren Gerlach <so****@all-about-shift.com> writes:

So...the dump in the above format needs some 14 minutes, the restore 10
minutes. This seems to be very slow as it means something like 100K/sec for
dumping and restoring. The drive is cappable of 40 Meg/seconds, so thats
not the bottleneck ,-) Anyhow postmaster and pg_dump seem to max out the
CPU cycles as it's running at nearly 100% while dumping and restoring.

What datatypes have you got in the large tables? Also, what character
set encoding are you using?

The only reason I can think of for dump to be that slow is if conversion
of the data to text is a big time sink. This would involve the
datatype's own output routine plus possibly a character set conversion.
You should at least make sure that no character set conversion needs to
happen (offhand I think this would only be an issue if pg_dump is
invoked with PGCLIENTENCODING set in its environment).

Also I trust you are using dump with the default COPY-style output,
not dump-as-INSERTs?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 23 '05 #3

Soeren Gerlach

So...the dump in the above format needs some 14 minutes, the restore 10
minutes. This seems to be very slow as it means something like 100K/sec
for dumping and restoring. The drive is cappable of 40 Meg/seconds, so
thats not the bottleneck ,-) Anyhow postmaster and pg_dump seem to max
out the CPU cycles as it's running at nearly 100% while dumping and
restoring.
What datatypes have you got in the large tables? Also, what character
set encoding are you using?

datestyle = 'ISO,European' ?
The only reason I can think of for dump to be that slow is if conversion
of the data to text is a big time sink. This would involve the
datatype's own output routine plus possibly a character set conversion.
You should at least make sure that no character set conversion needs to
happen (offhand I think this would only be an issue if pg_dump is
invoked with PGCLIENTENCODING set in its environment).
This are the schemas of the two big tables mentioned:

-------------------------------------------------------------------------
CREATE TABLE public.agent_action_history
(
aah_tag_id bigserial NOT NULL,
aah_action_type int2 NOT NULL DEFAULT 0,
aah_quantity float4 NOT NULL DEFAULT 0,
aah_price float4 NOT NULL DEFAULT 0,
aah_sim_flag bool NOT NULL DEFAULT false,
aah_timestamp timestamp NOT NULL,
aah_action_pl float4 NOT NULL DEFAULT 0
) WITHOUT OIDS;

CREATE TABLE public.tick_history
(
tkh_id serial NOT NULL,
tkh_cdt_id int8 NOT NULL,
tkh_price float4 NOT NULL,
tkh_price_type int2 NOT NULL,
tkh_volume float4 NOT NULL,
tkh_system_time timestamp(6) NOT NULL,
tkh_exchange_time timestamp(6) NOT NULL,
CONSTRAINT "PK_tick_history" PRIMARY KEY (tkh_id)
) WITHOUT OIDS;
-------------------------------------------------------------------------
Also I trust you are using dump with the default COPY-style output,
not dump-as-INSERTs?

Yes. I'm dumping using "-c -Fc -v" as options. I just rerun the whole:

* Total rows: 904,000
* Dumping plain format: 21min == 1260 sec == 717 rows/sec
* Dumping "-c -Fc" format: 26min == 1560 sec == 580 rows/sec
* pg_dump takes 2/3 of the [single] CPU, postmaster the other 1/3 for both
dumps
* the resulting "-Fc" dump is 1/10 of the size of the plain dump

It would be interesting to know from people with DBs > 1 Gig how fast they
can dump in terms of rows/second.

Another info: I just noticed that I've 7.4.1 running not 7.4.3. So I'll
retest this tomorrow again and provide you with numbers for 7.4.3 too.
I'm likely to install Sybase as a reference to Postgres next week. But I can
remember to got there something like 5-10,000 rows/sec some one year ago
when I tested a 11.5 version for this issue too.

Thanks,
Soeren Gerlach
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #4

Tom Lane

Soeren Gerlach <so****@all-about-shift.com> writes:

* pg_dump takes 2/3 of the [single] CPU, postmaster the other 1/3 for both
dumps

Really!? Hmm, that seems fairly backwards ... thinks ...

In the -Fc case this makes some amount of sense because pg_dump runs
gzip-style compression on the data (which is why the output file is
so much smaller). But for plain text dump, pg_dump should be just
pushing the data straight through to stdout; it really ought not take
much CPU as far as I can see. There may be some simple performance
glitch involved there. Are you interested in recompiling with -pg
and getting a gprof profile of pg_dump?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 23 '05 #5

Soeren Gerlach

Soeren Gerlach <so****@all-about-shift.com> writes:
* pg_dump takes 2/3 of the [single] CPU, postmaster the other 1/3 for
both dumps

Really!? Hmm, that seems fairly backwards ... thinks ...

In the -Fc case this makes some amount of sense because pg_dump runs
gzip-style compression on the data (which is why the output file is
so much smaller). But for plain text dump, pg_dump should be just
pushing the data straight through to stdout; it really ought not take
much CPU as far as I can see. There may be some simple performance
glitch involved there. Are you interested in recompiling with -pg
and getting a gprof profile of pg_dump?

Yes I'am but I'm a little short on time ,-)) In fact I'm glad to d'l a
ready-to-run archive for Debian Woody. In two weeks I'll have some time to
check this issue with my own compiled versions, until then I'm just
interested ,-))
Today I upgraded to 7.4.3 from 7.4.1 but this did not change anything real.
Do you have numbers in respect to speed (rows per second) for comparison
available. I.e. dump on a single CPU machine which quite fast drives?
Regards,
Soeren
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #6

Don Isgitt

Soeren Gerlach wrote:

Soeren Gerlach <so****@all-about-shift.com> writes:

* pg_dump takes 2/3 of the [single] CPU, postmaster the other 1/3 for
both dumps

Do you have numbers in respect to speed (rows per second) for comparison
available. I.e. dump on a single CPU machine which quite fast drives?
Regards,
Soeren

Hi Soeren,

I just used pg_dump on my database with your settings to give you a
reference point; however, my machine is a quad-Xeon, otherwise
comparable to yours. (2G ram, 15K SCSI) Top showed nominal CPU usage of
~20-25% during the run, so that agrees with your high cpu utilization.
But, the dump took 140 secs (elapsed time) for a 2.5GB (on disk) db. The
..dmp file was 650MB, and total rows was ~3M. So, that makes ~20000
rows/sec, which certainly suggests something is amiss with your hardware
or software or configuration or ...

Don

p.s. I am running PgSQL 7.4 on Red Hat 3.2.3 (Linux kernel 2.4.21-9 SMP)

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 23 '05 #7

by: Mark Mikulec | last post by:

Hi there, I wonder if anyone can shed some light on a very frustrating problem. I'm running a debian linux 3.0 "woody" server, nothing special, with the latest version of postres that apt-get...

PostgreSQL Database

BLOBs, pg_dump & pg_restore

by: Howard Lowndes | last post by:

My situation is that I am interacting PHP 4.1.2 to PostgreSQL 7.2.2 I have no difficulty inserting and managing BLOBs into the Large Object system table, and I have a user table called images...

PostgreSQL Database

dump and sequence initilizing

by: Markus Rebbert | last post by:

Hi list, what happens with sequences if i dump a database and i want to reload it? Do i have to initialize the sequences like SELECT setval('sequence', max(id)) FROM table; ? Best regards,

PostgreSQL Database

Dump/Restore ordering problem?

by: D. Dante Lorenso | last post by:

First I created a function that selected the next available pin code from a table of pre-defined pin codes: CREATE FUNCTION "public"."get_next_pin_code" () RETURNS varchar AS' DECLARE...

PostgreSQL Database

Can't backup! pg_dump/pg_restore not updated with 7.4.1

by: Aienthiwan | last post by:

Hello all, I'm running a debian server and recently updated my version of PostgreSQL to "unstable", that being v7.4.1. I had no trouble at all getting everything up to date and going. But my...

PostgreSQL Database

Postgres dump utility

by: Glen Parker | last post by:

Does anyone know of a way to get the output that pg_dump provides, but split in pieces? What I would like is one file with just the schema, one file with just the data, and one file with just...

PostgreSQL Database

dump/restore from 7.4.3linux to 8beta1 win32

by: Josué Maldonado | last post by:

I did the pg_dump with the Linux 7.4.3 version and tried to pg_restore with Win32 8beta1 version and I got: C:\TEMP>pg_restore -d desarrollo -i -U postgres xdump pg_restore: input file does not...

PostgreSQL Database

pg_dump, pg_restore

by: aktivo | last post by:

pg_dump remove name of schema from default value. i don't know why. how can i solve this problem ? For example: BEFORE PG_DUMP: CREATE TABLE "is8"."tb_kotuce" ( "id_kotuc" CHAR(10) DEFAULT...

PostgreSQL Database

Unable to restore postgres-dump-Version-8.2.4

by: clearissues | last post by:

Hi All, I have a python script which takes dump of postgres and restores the same. When i take a dump there is no problem. command to take dump: pg_dump -b -c -C --format=c -d -h <<hostname>>...

PostgreSQL Database

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Slow dump with pg_dump/pg_restore ? How to improve ?

Similar topics