Hi,
While on the topic of "need for in-place upgrades", I got to think-
ing how the pg_restore could be speeded up.
Am I wrong in saying that in the current pg_restore, all of the
indexes are created in serial?
How about this new, multi-threaded way of doing the pg_restore:
0. On the command line, you specify how many threads you want.
1. pg_restore creates the metadata.
2. pg_restore creates a set of streams/children that have the table
loads and index-creates (PKs and "regular" indexes), where each
streams does a set of tables.
3. pg_restore waits for all the streams to complete.
4. pg_restore then creates the FKs, stored procedures, triggers,
etc, etc.
Pardon if this has already been suggested and shot down as impossible
or too difficult, etc.
--
-----------------------------------------------------------------
Ron Johnson, Jr. ro***********@cox.net
Jefferson, LA USA
"Why should we not accept all in favor of woman suffrage to our
platform and association even though they be rabid pro-slavery
Democrats."
Susan B. Anthony, _History_of_Woman_Suffrage_ http://www.ifeminists.com/introducti...roduction.html
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org) 14 3613
A long time ago, in a galaxy far, far away, ro***********@cox.net (Ron Johnson) wrote: Pardon if this has already been suggested and shot down as impossible or too difficult, etc.
None of this is forcibly the problem.
The _big_ problem is that this isn't necessarily going to improve the
speed of the would-be restore.
I have brought servers to their knees by trying to restore multiple
databases at once.
The reason why this isn't likely to work is that the "multitasking"
that you need to maximize is not of the CPUs, but rather of the disk
drives.
Restoring a database involves, for each table:
1. Reading table data from the source file;
2. Writing data to the database file for the table;
3. After that, reading the database file data, and
4. Writing the sorted bits to the index file.
5. Along with all of this, HEFTY amounts of updates to WAL.
Ideal performance for this involves having 1. and 2. take place
concurrently, where a sequential set of reads from one disk lead to
more-or-less sequential writes to the "database drive" and to the "WAL
drive."
Multitask that and it is easy for the I/O patterns to go bad, where
one process is trying to sequence writes to one table whilst another
is writing to another, so you thereby have the disk seeking back and
forth between the one and the other.
There might well be some merit to having the capability to work on
several indices on the same table all at once, so that after reading
through the table once, _three_ indices are generated, mostly in
memory, and then consecutively spilled to disk. But it may be just as
well to let the shared memory cache do the work so that the table gets
read in once, to generate the first index, and then is already in
cache, which will surely help performance for generating the
subsequent indices.
That's a longwinded way of saying that doing multiple concurrent mass
data loads sucks pretty bad.
--
"aa454","@","freenet.carleton.ca" http://www3.sympatico.ca/cbbrowne/spiritual.html
"If it can't be abused, it's not freedom. A man may be in as just
possession of truth as of a city, and yet be forced to surrender."
-- Thomas Browne
Christopher Browne <cb******@acm.org> writes: Restoring a database involves, for each table: 1. Reading table data from the source file; 2. Writing data to the database file for the table; 3. After that, reading the database file data, and 4. Writing the sorted bits to the index file. 5. Along with all of this, HEFTY amounts of updates to WAL.
An idea that Marc and Jan and I were kicking around last night was to
offer a GUC option to disable writes to WAL. During initial data load,
you might as well go back to initdb if you have any failure, so why
bother with full ACID compliance? I'm not sure if the performance
benefit would be great enough to make it worth equipping the system
with such a large-caliber foot-gun, but it's something to think about.
I tend to agree with your doubts about parallelizing index builds,
but there may be scenarios where it's a win; it'd depend on your
relative CPU and disk horsepower. (Consider fast disk and multiple
not-so-fast CPUs; serial index builds can only use one of the CPUs.)
Question is, is it a big enough win for enough people to make it worth
supporting?
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend
On Tue, 16 Sep 2003, Tom Lane wrote: Christopher Browne <cb******@acm.org> writes: Restoring a database involves, for each table: 1. Reading table data from the source file; 2. Writing data to the database file for the table; 3. After that, reading the database file data, and 4. Writing the sorted bits to the index file. 5. Along with all of this, HEFTY amounts of updates to WAL.
An idea that Marc and Jan and I were kicking around last night was to offer a GUC option to disable writes to WAL. During initial data load, you might as well go back to initdb if you have any failure, so why bother with full ACID compliance? I'm not sure if the performance benefit would be great enough to make it worth equipping the system with such a large-caliber foot-gun, but it's something to think about.
I tend to agree with your doubts about parallelizing index builds, but there may be scenarios where it's a win; it'd depend on your relative CPU and disk horsepower. (Consider fast disk and multiple not-so-fast CPUs; serial index builds can only use one of the CPUs.) Question is, is it a big enough win for enough people to make it worth supporting?
I'd say anything that improves postgresql's performance on medium to large
iron is worth at least trying. I imagine a 12 disk RAID5 with battery
backed RAID would likely be able to use parallelized index builds and even
loads.
Not so sure on whether the foot gun is a good idea. We already have .22
calibre foot gun (fsync) that makes for pretty big improvements in load
speed, and we see people all the time on General and Performance running
production servers with it turned off. You know as well as I do the
second we make WAL optional, some people are gonna start running
production servers with it.
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly tg*@sss.pgh.pa.us (Tom Lane) writes: Christopher Browne <cb******@acm.org> writes: Restoring a database involves, for each table: 1. Reading table data from the source file; 2. Writing data to the database file for the table; 3. After that, reading the database file data, and 4. Writing the sorted bits to the index file. 5. Along with all of this, HEFTY amounts of updates to WAL. An idea that Marc and Jan and I were kicking around last night was to offer a GUC option to disable writes to WAL. During initial data load, you might as well go back to initdb if you have any failure, so why bother with full ACID compliance? I'm not sure if the performance benefit would be great enough to make it worth equipping the system with such a large-caliber foot-gun, but it's something to think about.
That is a good thought.
To make it _marginally_ less unsafe, it might be an idea to only
enable this in some variation on single user mode so that it is made
manifestly clear that this is a "recovery" process and that nobody
else should be poking their head in on.
(Or, in keeping with the metaphor, it involves having a large flashing
light that says "Stay off the range - live fire artillery exercise
under way!" "Make sure this isn't pointed at Mabel or any other
highly trained monkeys or databases you want to keep around!"
<http://cbbrowne.com/info/images/shooting.jpg> Similarly, if my
brother gets behind the wheel of Certain Large Vehicles, it is good to
keep any lurking trees out of the way...
<http://www3.ns.sympatico.ca/coffee/pics/brdtnk2.gif> :-))
I tend to agree with your doubts about parallelizing index builds, but there may be scenarios where it's a win; it'd depend on your relative CPU and disk horsepower. (Consider fast disk and multiple not-so-fast CPUs; serial index builds can only use one of the CPUs.) Question is, is it a big enough win for enough people to make it worth supporting?
That one would take some benchmarking to see if/where it would be a
win. And "fast disk and multiple cheezy CPUs" sounds pretty atypical
in these days of Opterons and multiple GHz Intel hardware. That being
said, I could point to a _perfect_ server to try it out on, with the
problem that it isn't representative of the sort of 'production'
environment where you'd actually care about tuning the builds.
Certainly doesn't fit into the set of things I can afford to kludge
into my schedule :-(.
--
let name="aa454" and tld="freenet.carleton.ca" in String.concat "@" [name;tld];; http://www.ntlug.org/~cbbrowne/x.html
If nothing ever sticks to Teflon, how do they make Teflon stick to the
pan?
On Tue, 16 Sep 2003, scott.marlowe wrote: Not so sure on whether the foot gun is a good idea. We already have .22 calibre foot gun (fsync) that makes for pretty big improvements in load speed, and we see people all the time on General and Performance running production servers with it turned off. You know as well as I do the second we make WAL optional, some people are gonna start running production servers with it.
it shouldn't be too difficult to put some sort of restrictions on its
usual ... say if WAL disabled, max connections == 2? :)
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings
ROTFLMAO!
That's just the trigger I needed for a belly laugh today. Thanks guys!
Marc G. Fournier wrote: On Tue, 16 Sep 2003, scott.marlowe wrote: Not so sure on whether the foot gun is a good idea. We already have .22 calibre foot gun (fsync) that makes for pretty big improvements in load speed, and we see people all the time on General and Performance running production servers with it turned off. You know as well as I do the second we make WAL optional, some people are gonna start running production servers with it.
it shouldn't be too difficult to put some sort of restrictions on its usual ... say if WAL disabled, max connections == 2? :)
---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match
On Mon, 2003-09-15 at 21:55, Christopher Browne wrote: A long time ago, in a galaxy far, far away, ro***********@cox.net (Ron Johnson) wrote: Pardon if this has already been suggested and shot down as impossible or too difficult, etc. None of this is forcibly the problem.
The _big_ problem is that this isn't necessarily going to improve the speed of the would-be restore.
I have brought servers to their knees by trying to restore multiple databases at once.
And I've seen aggregate performance increase by parallelizing table
inserts. It all depends on the IO sub-system.
I'm sure that, as Scott says, the win would be bigger on systems
with fast IO systems (large RAID systems, or, even, once table-
spaces are implemented, multiple controllers).
The reason why this isn't likely to work is that the "multitasking" that you need to maximize is not of the CPUs, but rather of the disk drives.
Restoring a database involves, for each table: 1. Reading table data from the source file;
So, how about parallelized pg_dump, that can create multiple
dump files. Match the number of dump files with the number
of pg_restore threads.
Of course, as mentioned before, the better the IO system, the
more threads it would take to swamp.
2. Writing data to the database file for the table; 3. After that, reading the database file data, and 4. Writing the sorted bits to the index file. 5. Along with all of this, HEFTY amounts of updates to WAL.
Soooo, have pg_restore bypass the WAL. It's one big transaction
anyway, so why write to it anyway. (Thanks to Tom for this idea.)
Ideal performance for this involves having 1. and 2. take place concurrently, where a sequential set of reads from one disk lead to more-or-less sequential writes to the "database drive" and to the "WAL drive."
Multitask that and it is easy for the I/O patterns to go bad, where one process is trying to sequence writes to one table whilst another is writing to another, so you thereby have the disk seeking back and forth between the one and the other.
Well, the DBA has to be cluefull enough to match the number of
threads to the capacity of the IO system.
There might well be some merit to having the capability to work on several indices on the same table all at once, so that after reading through the table once, _three_ indices are generated, mostly in memory, and then consecutively spilled to disk. But it may be just as
I've been dreaming of that for years...
well to let the shared memory cache do the work so that the table gets read in once, to generate the first index, and then is already in cache, which will surely help performance for generating the subsequent indices.
If the table is, say, 20GB, then how much would the cache really
matter, since it would get full.
That's a longwinded way of saying that doing multiple concurrent mass data loads sucks pretty bad.
No, it doesn't suck, it's a big win, depending on the "capacity"
of the IO system.
--
-----------------------------------------------------------------
Ron Johnson, Jr. ro***********@cox.net
Jefferson, LA USA
After listening to many White House, Pentagon & CENTCOM
briefings in both Gulf Wars, it is my firm belief that most
"senior correspondents" either have serious agendas that don't
get shaken by facts, or are dumb as dog feces.
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend
On Tue, 16 Sep 2003, Marc G. Fournier wrote:
On Tue, 16 Sep 2003, scott.marlowe wrote:
Not so sure on whether the foot gun is a good idea. We already have .22 calibre foot gun (fsync) that makes for pretty big improvements in load speed, and we see people all the time on General and Performance running production servers with it turned off. You know as well as I do the second we make WAL optional, some people are gonna start running production servers with it.
it shouldn't be too difficult to put some sort of restrictions on its usual ... say if WAL disabled, max connections == 2? :)
Even better, restrict logins to superuser only, that way we could still
have more than 2 things happening (think of a machine with a huge number
of disks in a RAID array kinda thing) or only updateable / writeable by
the superuser when in non-WAL mode.
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)
that works too ... basically, adding 'security' for a "load nly" mode
shouldn't be to difficult
On Tue, 16 Sep 2003, scott.marlowe wrote: On Tue, 16 Sep 2003, Marc G. Fournier wrote:
On Tue, 16 Sep 2003, scott.marlowe wrote:
Not so sure on whether the foot gun is a good idea. We already have .22 calibre foot gun (fsync) that makes for pretty big improvements in load speed, and we see people all the time on General and Performance running production servers with it turned off. You know as well as I do the second we make WAL optional, some people are gonna start running production servers with it.
it shouldn't be too difficult to put some sort of restrictions on its usual ... say if WAL disabled, max connections == 2? :)
Even better, restrict logins to superuser only, that way we could still have more than 2 things happening (think of a machine with a huge number of disks in a RAID array kinda thing) or only updateable / writeable by the superuser when in non-WAL mode.
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org
"scott.marlowe" <sc***********@ihs.com> writes: Not so sure on whether the foot gun is a good idea. We already have .22 calibre foot gun (fsync) that makes for pretty big improvements in load speed, and we see people all the time on General and Performance running production servers with it turned off. You know as well as I do the second we make WAL optional, some people are gonna start running production servers with it.
Well, yeah, they will. On a noncritical server, is that a sin? I mean,
if we offer fsync-off, it's not clear to me that offering WAL-off makes
the difference between venial and mortal sin. Seems to me we're just
putting the weapons in the display case. fsync = .22, WAL = .45,
but you shoot your foot with either one it's still gonna ruin your day.
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings
On 17 Sep 2003 at 0:16, Tom Lane wrote: "scott.marlowe" <sc***********@ihs.com> writes: Not so sure on whether the foot gun is a good idea. We already have .22 calibre foot gun (fsync) that makes for pretty big improvements in load speed, and we see people all the time on General and Performance running production servers with it turned off. You know as well as I do the second we make WAL optional, some people are gonna start running production servers with it.
Well, yeah, they will. On a noncritical server, is that a sin? I mean, if we offer fsync-off, it's not clear to me that offering WAL-off makes the difference between venial and mortal sin. Seems to me we're just putting the weapons in the display case. fsync = .22, WAL = .45, but you shoot your foot with either one it's still gonna ruin your day.
If somebopdy wants WAL effectively turned off, then can symlink WAL to a
ramdisk that has a GB under the carpet. That would offer all the "benefits" of
WAL being tunred off.
Why this new provision? Is it really that difficult to mount WAL on ramdisk
during reload?
See, we offer non-transaction mode mysql defaults to, already..:-)
Just a thought..
Bye
Shridhar
--
QOTD: "I used to go to UCLA, but then my Dad got a job."
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly
"Shridhar Daithankar" <sh*****************@persistent.co.in> writes: On 17 Sep 2003 at 0:16, Tom Lane wrote: Well, yeah, they will. On a noncritical server, is that a sin? I mean, if we offer fsync-off, it's not clear to me that offering WAL-off makes the difference between venial and mortal sin.
If somebopdy wants WAL effectively turned off, then can symlink WAL to a ramdisk that has a GB under the carpet. That would offer all the "benefits" of WAL being tunred off.
No, because the point of the proposal is to turn off WAL *temporarily*
during initial database load. Having to move WAL around and then back
again isn't simple, it isn't fool-proof, and it doesn't buy all of the
intended speed savings (the above might save some disk bandwidth but it
avoids none of the CPU expense associated with creating WAL entries).
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly
On Wed, 17 Sep 2003, Tom Lane wrote: "scott.marlowe" <sc***********@ihs.com> writes: Not so sure on whether the foot gun is a good idea. We already have .22 calibre foot gun (fsync) that makes for pretty big improvements in load speed, and we see people all the time on General and Performance running production servers with it turned off. You know as well as I do the second we make WAL optional, some people are gonna start running production servers with it.
Well, yeah, they will. On a noncritical server, is that a sin? I mean, if we offer fsync-off, it's not clear to me that offering WAL-off makes the difference between venial and mortal sin. Seems to me we're just putting the weapons in the display case. fsync = .22, WAL = .45, but you shoot your foot with either one it's still gonna ruin your day.
Now that you mention it, there are some areas where no WAL makes sense,
like proxy caching data stores and such, where errors aren't critical
because you can just grab the original data.
Postgresql, now with MySQL data integrity mode. :-)
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
We already have on TODO:
* Turn off after-change writes if fsync is disabled (?)
I am wondering if we should remove the fsync option completely and just
have a "os_crash_unsafe" option that turns off fsync and WAL, or at
least all the WAL used for os crash recovery --- I think we still need
WAL to recover from dirty buffers that didn't get written to the OS
cache.
---------------------------------------------------------------------------
Tom Lane wrote: Christopher Browne <cb******@acm.org> writes: Restoring a database involves, for each table: 1. Reading table data from the source file; 2. Writing data to the database file for the table; 3. After that, reading the database file data, and 4. Writing the sorted bits to the index file. 5. Along with all of this, HEFTY amounts of updates to WAL.
An idea that Marc and Jan and I were kicking around last night was to offer a GUC option to disable writes to WAL. During initial data load, you might as well go back to initdb if you have any failure, so why bother with full ACID compliance? I'm not sure if the performance benefit would be great enough to make it worth equipping the system with such a large-caliber foot-gun, but it's something to think about.
I tend to agree with your doubts about parallelizing index builds, but there may be scenarios where it's a win; it'd depend on your relative CPU and disk horsepower. (Consider fast disk and multiple not-so-fast CPUs; serial index builds can only use one of the CPUs.) Question is, is it a big enough win for enough people to make it worth supporting?
regards, tom lane
---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend
--
Bruce Momjian | http://candle.pha.pa.us pg***@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: andy morrow |
last post by:
hi,
fairly new to postgres admin stuff......
i have a production machine that is running postgresql 7.1.3
also, there's a test machine which already had 7.0.3, and which i newly
installed...
|
by: Joshua D. Drake |
last post by:
Alright, we are testing pg_restore while restoring a 7GB database. This
database has about 6GB of large objects
and about 1Gb of textual data. The problem has been verified on
PostgreSQL 7.3.2,...
|
by: nednieuws | charles |
last post by:
What does this error mean:
pg_restore: creating TABLE author
pg_restore: creating SEQUENCE author_id
pg_restore: could not execute query: ERROR: parser: parse error at or near "BY" at...
|
by: Mike Charnoky |
last post by:
Hello,
I am currently using PostgreSQL v7.3.4 on a RedHat 8.0 system (2.4.23 kernel)
using the ext3 filesystem. I am experiencing problems when performing a
pg_restore using a file which is...
|
by: Plant Thomas |
last post by:
Hi,
I have a problem restoring a database from Postgresql 7.3 ( backup
created with 'pg_dump -Ft -b > test.tar' ) to a new database on
PostgreSQL 7.4.3 on Cygwin. I get the following error:
...
|
by: Sven Willenberger |
last post by:
Created a pg_dump with Fc (custom format compression) that resulted in a
300+MB file. Now trying to pg_restore this thing fails with either an
out of memory error (as in the subject line) on...
|
by: Tim Penhey |
last post by:
Maybe it's just me, but I can't seem to get pg_restore to restore a
database...
I am running 8.0 beta 2 (using the dev3 installer) on Windows XP.
I created a very simple database with one...
|
by: Phil Endecott |
last post by:
Dear Postgresql people,
For the first time I'm trying to use pg_restore to do a partial restore.
It looks as if it should be easy: pg_restore -l, filter out the
required tables, then pg_restore...
|
by: ruben |
last post by:
Hi:
I'm trying to dump tableA and restore it to tableB:
$ ./pg_dump -Fc -t tableA databaseA -f tableA.dump -v
$ ./pg_restore -t tableB -d databaseA tableA.dump -v
pg_dump creates...
|
by: erikbower65 |
last post by:
Using CodiumAI's pr-agent is simple and powerful. Follow these steps:
1. Install CodiumAI CLI: Ensure Node.js is installed, then run 'npm install -g codiumai' in the terminal.
2. Connect to...
|
by: linyimin |
last post by:
Spring Startup Analyzer generates an interactive Spring application startup report that lets you understand what contributes to the application startup time and helps to optimize it. Support for...
|
by: erikbower65 |
last post by:
Here's a concise step-by-step guide for manually installing IntelliJ IDEA:
1. Download: Visit the official JetBrains website and download the IntelliJ IDEA Community or Ultimate edition based on...
|
by: kcodez |
last post by:
As a H5 game development enthusiast, I recently wrote a very interesting little game - Toy Claw ((http://claw.kjeek.com/))。Here I will summarize and share the development experience here, and hope it...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Sept 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM)
The start time is equivalent to 19:00 (7PM) in Central...
|
by: DJRhino1175 |
last post by:
When I run this code I get an error, its Run-time error# 424 Object required...This is my first attempt at doing something like this. I test the entire code and it worked until I added this -
If...
|
by: DJRhino |
last post by:
Private Sub CboDrawingID_BeforeUpdate(Cancel As Integer)
If = 310029923 Or 310030138 Or 310030152 Or 310030346 Or 310030348 Or _
310030356 Or 310030359 Or 310030362 Or...
|
by: lllomh |
last post by:
Define the method first
this.state = {
buttonBackgroundColor: 'green',
isBlinking: false, // A new status is added to identify whether the button is blinking or not
}
autoStart=()=>{
|
by: Mushico |
last post by:
How to calculate date of retirement from date of birth
| |