473,725 Members | 1,787 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

pg_restore takes ages

Hi,
I use pg_restore to load a previously dumped database. (10mil records).
the load of the data runs quite fast but when starting creating the
triggers for foreign keys it takes forever.

Isnt there are a faster way. after all the triggers in the source db
already made sure the data was clean.

Thanks
Alex

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 12 '05 #1
14 13327
Alex wrote:
Hi,
I use pg_restore to load a previously dumped database. (10mil records).
the load of the data runs quite fast but when starting creating the
triggers for foreign keys it takes forever.

Isnt there are a faster way. after all the triggers in the source db
already made sure the data was clean.


You can try creating index/triggers first and load the data. At the end it will
take a while before you get a usable database with either approach but see what
works faster for you.

Personally I was in a situation where postgresql was hogging space while
creating index on a table that had 81M rows with 3GB disk footprint. I dropped
the table and recreated it. Also created index before loading data. The loading
was slow with this approach but it finished in 3 hours. And I had an updated
index as well. Just had to run vacuum over it.

Take your pick..

Shridhar

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postg resql.org so that your
message can get through to the mailing list cleanly

Nov 12 '05 #2
>>>>> "SD" == Shridhar Daithankar <sh************ *****@persisten t.co.in> writes:

SD> You can try creating index/triggers first and load the data. At the
SD> end it will take a while before you get a usable database with either
SD> approach but see what works faster for you.

The triggers and FK's don't do much at the time they are created.
They work upon update/insert/delete of data.

SD> footprint. I dropped the table and recreated it. Also created index
SD> before loading data. The loading was slow with this approach but it
SD> finished in 3 hours. And I had an updated index as well. Just had to
SD> run vacuum over it.

I cannot believe that this was faster than load data followed by
create index. Perhaps you needed to bump sort_mem so the index could
be created more efficiently. I also find that bumping up
checkpoint_segm ents to a high number speeds things up considerably.

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D. Khera Communications, Inc.
Internet: kh***@kciLink.c om Rockville, MD +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 12 '05 #3
Vivek Khera wrote:
SD> footprint. I dropped the table and recreated it. Also created index
SD> before loading data. The loading was slow with this approach but it
SD> finished in 3 hours. And I had an updated index as well. Just had to
SD> run vacuum over it.

I cannot believe that this was faster than load data followed by
create index. Perhaps you needed to bump sort_mem so the index could
be created more efficiently. I also find that bumping up
checkpoint_segm ents to a high number speeds things up considerably.


Well, In my case speed wasn't the issue. I put $PGDATA on a 12/13GB partition
and loaded 3GB of table. When I went to create index, it ran out of rest of the
free space which was close to 9GB. Actually I killed it because when it started
it had 9GB free and when I killed it, there was only 150MB free left.

Oracle had same problems. With tablespaces set to auto extent it ate huge amount
of space.

I posted this earlier and Tom remarked it the same, saying that it should be
same one way or other.

Anyway the project abandoned all the database and went to in memory structures..:-)

Shridhar
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 12 '05 #4
On Fri, 3 Oct 2003, Vivek Khera wrote:
>> "SD" == Shridhar Daithankar <sh************ *****@persisten t.co.in> writes:


SD> You can try creating index/triggers first and load the data. At the
SD> end it will take a while before you get a usable database with either
SD> approach but see what works faster for you.

The triggers and FK's don't do much at the time they are created.
They work upon update/insert/delete of data.

SD> footprint. I dropped the table and recreated it. Also created index
SD> before loading data. The loading was slow with this approach but it
SD> finished in 3 hours. And I had an updated index as well. Just had to
SD> run vacuum over it.

I cannot believe that this was faster than load data followed by
create index. Perhaps you needed to bump sort_mem so the index could
be created more efficiently. I also find that bumping up
checkpoint_segm ents to a high number speeds things up considerably.


also, turning off fsync during the load helps a lot. Be sure to turn it
back on when you're done of course.
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to ma*******@postg resql.org)

Nov 12 '05 #5
On Fri, Oct 03, 2003 at 01:06:26PM -0600, scott.marlowe wrote:
also, turning off fsync during the load helps a lot. Be sure to turn it
back on when you're done of course.


I'm not sure I understand why this is so. If I turn fsync off, it means
that I won't force the kernel to write WAL logs to disk, but they will
have to be written eventually. If you have tons of RAM it may well be
that the kernel will just keep dirty buffers in RAM, but if not there
should not be any difference. Am I missing something?

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Investigac ión es lo que hago cuando no sé lo que estoy haciendo"
(Wernher von Braun)

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 12 '05 #6
>>>>> "sm" == scott marlowe <scott.marlow e> writes:

sm> also, turning off fsync during the load helps a lot. Be sure to turn it
sm> back on when you're done of course.

Only if you don't have a battery-backed cache on your RAID. If you
do, it won't likely make a big difference. For me it was about 2
seconds over a 4-hour restore.

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 12 '05 #7
On Fri, 3 Oct 2003, Alvaro Herrera wrote:
On Fri, Oct 03, 2003 at 01:06:26PM -0600, scott.marlowe wrote:
also, turning off fsync during the load helps a lot. Be sure to turn it
back on when you're done of course.


I'm not sure I understand why this is so. If I turn fsync off, it means
that I won't force the kernel to write WAL logs to disk, but they will
have to be written eventually. If you have tons of RAM it may well be
that the kernel will just keep dirty buffers in RAM, but if not there
should not be any difference. Am I missing something?


Yes, you are. Basically, with fsync on, things have to happen in order.

I.e.

write to WAL what you're gonna do. WAIT for confirmation on write
write the tuples out. wait for confirmation
checkpoint the WAL. wait for confirmation

Notice the wait for confirmation above. Without fsync, there's no wait,
you just write it all out at once, and hope the machine / database doesn't
ever crash in the middle of a transaction.

Give it a try, turn off fsync, run pgbench -c 4 -t 1000, then turn it back
on and see how much it slows down.

Pull the plug while the transactions are running with fsync on, and your
machine, assuming it has a meta-data journaling file system, will come
right back, and postgresql will replay the WAL files and you'll have a
nice consistent database.

turn off fsync, initiate many transactions, pull the plug, and look at
your corrupted database refuse to start on update.

Note that if you're running on IDE drives, you already ARE probably
running with fsync off if write caching is enabled, so you'll need to turn
it off (hdparm -W0 /dev/hdx in linux) to ensure fsync actually works.
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 12 '05 #8
On Fri, 3 Oct 2003, Vivek Khera wrote:
>> "sm" == scott marlowe <scott.marlow e> writes:


sm> also, turning off fsync during the load helps a lot. Be sure to turn it
sm> back on when you're done of course.

Only if you don't have a battery-backed cache on your RAID. If you
do, it won't likely make a big difference. For me it was about 2
seconds over a 4-hour restore.


True, very true. Have you done the "pull the plug" test on it to make
sure it really works, by the way?
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to ma*******@postg resql.org)

Nov 12 '05 #9
"scott.marl owe" <sc***********@ ihs.com> writes:
Yes, you are. Basically, with fsync on, things have to happen in order.
I.e.
write to WAL what you're gonna do. WAIT for confirmation on write
write the tuples out. wait for confirmation
checkpoint the WAL. wait for confirmation
Not really. With fsync on, we *only* sync the WAL writes. Data writes
can happen whenever, so long as we know the corresponding WAL writes
went down first. We only wait for data writes to complete before
considering that a checkpoint is complete --- which is something that is
not in the main line of execution and doesn't block other activity.

This is one good reason for keeping WAL on a separate drive from the
data files --- you are then freeing the system to schedule data I/O as
optimally as it can.
Note that if you're running on IDE drives, you already ARE probably
running with fsync off if write caching is enabled, so you'll need to turn
it off (hdparm -W0 /dev/hdx in linux) to ensure fsync actually works.


It'd be interesting to think about whether a write-caching IDE drive
could safely be used for data storage, if WAL is elsewhere.

Right offhand I think the only problem is how to know when it's safe
to consider a checkpoint complete. Maybe all that would be needed is
a long enough time delay after issuing sync(2) in the checkpoint code.
Do these drives guarantee "data will be written within 30 seconds" or
something like that? Or can the delay be indefinite when load is heavy?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postg resql.org

Nov 12 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
1491
by: andy morrow | last post by:
hi, fairly new to postgres admin stuff...... i have a production machine that is running postgresql 7.1.3 also, there's a test machine which already had 7.0.3, and which i newly installed 7.3.3 so, i dumped all the databases from the production db and reloaded on the test,
0
2089
by: Joshua D. Drake | last post by:
Alright, we are testing pg_restore while restoring a 7GB database. This database has about 6GB of large objects and about 1Gb of textual data. The problem has been verified on PostgreSQL 7.3.2, PostgreSQL 7.3.3 and Mammoth PostgreSQL 7.3.4 . The command being used to backup the data is: /usr/local/pgsql/bin/pg_dump dominion -h localhost --superuser=postgres --create --format=c -b -o -f /backups/backup.sql.tar.gz The command has also...
14
3714
by: Ron Johnson | last post by:
Hi, While on the topic of "need for in-place upgrades", I got to think- ing how the pg_restore could be speeded up. Am I wrong in saying that in the current pg_restore, all of the indexes are created in serial? How about this new, multi-threaded way of doing the pg_restore: 0. On the command line, you specify how many threads you want.
1
2093
by: Nicolae Fieraru | last post by:
Hi All, I have a Table1 with ID, FirstName, Surname, Address1, Address2, PostCode, Suburb, Purchase. The table list all purchases made by all customers. A customer can have multiple purchases and there will be multiple records for the same customer. The address details might not be the same (for example Suburb might be W Gosford in one record and West Gosford in another record) I want to build a query which shows all the customers from...
3
1230
by: Paul Tomlinson | last post by:
All very simple app which loads a log file, parses it and displays the lines in a listview. I was playing about with some largeish files 800K and I noticed that the app would never return control, it would just sit and consume 100% CPU for aslong as I could be bothered to wait before I pskill's the app. Anyway I was playing and found that even the code below takes a matter of minutes to populate the control: ...
6
4973
by: Mike Charnoky | last post by:
Hello, I am currently using PostgreSQL v7.3.4 on a RedHat 8.0 system (2.4.23 kernel) using the ext3 filesystem. I am experiencing problems when performing a pg_restore using a file which is 2.3G in size. The dump, which seemed to run smoothly, was created using the -Fc option. When I perform the restore, the following error occurs before the pg_restore fails: pg_restore: error during file seek: Invalid argument pg_restore: ***...
7
7767
by: Tim Penhey | last post by:
Maybe it's just me, but I can't seem to get pg_restore to restore a database... I am running 8.0 beta 2 (using the dev3 installer) on Windows XP. I created a very simple database with one table and one function and dumped it out using: pg_dump -U postgres -F c -f test.dump test
1
10218
by: ruben | last post by:
Hi: I'm trying to dump tableA and restore it to tableB: $ ./pg_dump -Fc -t tableA databaseA -f tableA.dump -v $ ./pg_restore -t tableB -d databaseA tableA.dump -v pg_dump creates tableA.dump aparently well, but after running pg_restore without errors I cannot find any "tableB", what am I doing wrong?
7
1932
by: jabernet | last post by:
The following insert takes ages, although it started out pretty fast. I monitored it with DB2mon, and in the begin insert count was about 1000 per minute while after 300000 or so rows, it droped to about 20 rows per minute. Anyone any idea what could cause that? Regards, Janick INSERT INTO
0
8886
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8751
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9257
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9171
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9108
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
6702
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6010
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4781
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
2632
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.