disaster recovery - PostgreSQL Database

Jason Tesser

We are evaluating Postgres and would like some input about disaster recovery. I know in MsSQL they have a feature called transactional
logs that would enable a database to be put back together based off those logs. Does Postgres do anything like this? I saw in the documentation
transactional logging but I don't know if it is the same. Where can I find info about disaster recovery in Postgres. Thank you in advance
for any info given.

Jason Tesser
Web/Multimedia Programmer
Northland Ministries Inc.
(715)324-6900 x3050
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to ma*******@postg resql.org)

Nov 12 '05 #1

Subscribe Reply

3796

Alex Satrapa

Jason Tesser wrote:

We are evaluating Postgres and would like some input about disaster recovery.
I'm going to try to communicate what I understand, and other list
members can correct me at their selected level of vehemence :)
Please send corrections to the list - I may take days to post follow-ups.
I know in MsSQL they have a feature called transactional
logs that would enable a database to be put back together based off those logs.

A roughly parallel concept in PostgreSQL (what's the correct
capitalisation and spelling?) is the "Write Ahead Log" (WAL). There is
also a quite dissimilar concept called the query log - which is good to
inspect for common queries to allow database tuning, but is not replay-able.

The theory is that given a PostgreSQL database and the respective WAL,
you can recreate the database to the time that the last entry of the WAL
was written to disk.

Some caveats though:
1) Under Linux, if you have the file system containing the WAL mounted
with asynchronous writes, "all bets are off". The *BSD crowd (that I
know of) take great pleasure in constantly reminding me that if the
power fails, my file system will be in an indeterminate state - things
could be half-written all over the file system.
2) If you're using IDE drives, under any operating system, and have
write-caching turned on in the IDE drives themselves, again "all bets
are off"
3) If you're using IDE drives behind a RAID controller, YMMV.

So to play things safe, one recommendation to ensure database robustness
is to:
1) Store the WAL on a separate physical drive
2) Under Linux, mount that file system with synchronous writes (ie:
fsync won't return until the data is actually, really, written to the
interface)
3) If using IDE drives, turn off write caching on the WAL volume so that
you know data is actually written to disk when the drive claims it is.

Note that disabling write caching will impact write performance
significantly. Most people *want* write caching turned on for
throughput-critical file systems, and turned off for mission-critical
file systems.

Note too that SCSI systems tend to have no "write cache" as such, since
they use "tagged command queues". The OS can say to the SCSI drive
something that is effectively, "here are 15 blocks of data to write to
disk, get back to me when the last one is actually written to the
media", and continue on its way. On IDE, the OS can only have one
command outstanding - the purpose of the write cache is to allow
multiple commands to be received and "acknowledg ed" before any data is
actually written to the media.

When the host is correctly configured, you can recover a PostgreSQL
database from a hardware failure by recovering the database file itself
and "replaying" the WAL to that database.

Read more about WAL here:
http://www.postgresql.org/docs/current/static/wal.html

Regards
Alex
PS: Please send corrections to the list
PPS: Don't forget to include "fire drills" as part of your disaster
recovery plan - get plenty of practice at recovering a database from a
crashed machine so that you don't make mistakes when the time comes that
you actually need to do it!
PPPS: And follow your own advice ;)
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 12 '05 #2

Doug McNaught

Alex Satrapa <al**@lintelsys .com.au> writes:

1) Under Linux, if you have the file system containing the WAL mounted
with asynchronous writes, "all bets are off". The *BSD crowd (that I
know of) take great pleasure in constantly reminding me that if the
power fails, my file system will be in an indeterminate state - things
could be half-written all over the file system.

This is pretty out of date. If you use a journaling filesystem
(there are four solid ones available and modern distros use them)
metadata is consistent and crash recovery is fast.

Even with ext2, WAL files are preallocated and PG calls fsync() after
writing, so in practice it's not likely to cause problems.

-Doug

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postg resql.org

Nov 12 '05 #3

Bruce Momjian

Alex Satrapa wrote:

Some caveats though:
1) Under Linux, if you have the file system containing the WAL mounted
with asynchronous writes, "all bets are off". The *BSD crowd (that I
know of) take great pleasure in constantly reminding me that if the
power fails, my file system will be in an indeterminate state - things
could be half-written all over the file system.

This is only a problem for ext2. Ext3, Reiser, XFS, JFS are all fine,
though you get better performance from them by mounting them
'writeback'.

--
Bruce Momjian | http://candle.pha.pa.us
pg***@candle.ph a.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 12 '05 #4

Tom Lane

Doug McNaught <do**@mcnaught. org> writes:

Alex Satrapa <al**@lintelsys .com.au> writes:
1) Under Linux, if you have the file system containing the WAL mounted
with asynchronous writes, "all bets are off".

...
Even with ext2, WAL files are preallocated and PG calls fsync() after
writing, so in practice it's not likely to cause problems.

Um. I took the reference to "mounted with async write" to mean a
soft-mounted NFS filesystem. It does not matter which OS you think is
the one true OS --- running a database over NFS is the act of someone
with a death wish. But, yeah, soft-mounted NFS is a particularly
malevolent variety ...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 12 '05 #5

Doug McNaught

Tom Lane <tg*@sss.pgh.pa .us> writes:

Doug McNaught <do**@mcnaught. org> writes:
Alex Satrapa <al**@lintelsys .com.au> writes:
1) Under Linux, if you have the file system containing the WAL mounted
with asynchronous writes, "all bets are off".

...
Even with ext2, WAL files are preallocated and PG calls fsync() after
writing, so in practice it's not likely to cause problems.

Um. I took the reference to "mounted with async write" to mean a
soft-mounted NFS filesystem. It does not matter which OS you think is
the one true OS --- running a database over NFS is the act of someone
with a death wish. But, yeah, soft-mounted NFS is a particularly
malevolent variety ...

I took it as a garbled understanding of the "Linux does async metadata
updates" criticism. Which is true for ext2, but was never the
show-stopper some BSD-ers wanted it to be. :)

-Doug

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 12 '05 #6

Alex Satrapa

Doug McNaught wrote:

I took it as a garbled understanding of the "Linux does async metadata
updates" criticism. Which is true for ext2, but was never the
show-stopper some BSD-ers wanted it to be. :)

I have on several occasions demonstrated how "bad" asynchronous writes
are to a BSD-bigot by pulling the plug on a mail server (having a
terminal on another machine showing the results of tail -f
/var/log/mail.log), then showing that when the machine comes back up the
most we've ever lost is one message

From the BSD-bigot's point of view, this is equivalent to the end of
the world as we know it.

From my point of view, it's just support for my demands to have each
mission-critical server supported by a UPS, if not redundant power
supplies and two UPSes.

Alex
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 12 '05 #7

Craig O'Shannessy

From my point of view, it's just support for my demands to have each
mission-critical server supported by a UPS, if not redundant power
supplies and two UPSes.

Never had a kernel panic? I've had a few. Probably flakey hardware. I
feel safer since journalling file systems hit linux.
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 12 '05 #8

Alex Satrapa

Craig O'Shannessy wrote:

Never had a kernel panic? I've had a few. Probably flakey hardware. I
feel safer since journalling file systems hit linux.

The only kernel panic I've ever had was when playing with a development
version of the kernel (2.3.x). Never played with development kernels
since then - I'm a user, not a developer.

All the outages I've experienced so far have been due to external
factors such as (in order of frequency):
- Colocation facility technicians repatching panels and
putting my connection "back" into the wrong port
- Colo facility power failure (we were told they had dual
redundant diesel+battery UPS, but they only had one, the
second was being installed "any time now")
- End user's machines crashing
- Client software crashing
- Colo facility techs ripping power cables or network
cables while "cleaning up" cable trays
- Hard drive failure (hard, fast and very real - one
revolution the drive was working, the next it was a
charred blackened mess of fibreglass, silicon and
aluminium)

I have to admit that in none of those cases would synchronous vs
asynchronous, journalling vs non-journalling or *any* file system
decision have made the slightest jot of a difference to the integrity of
my data.

I've yet to experience a CPU failure (touch wood!).
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 12 '05 #9

Christopher Browne

al**@lintelsys. com.au (Alex Satrapa) writes:

Craig O'Shannessy wrote:
Never had a kernel panic? I've had a few. Probably flakey hardware. I
feel safer since journalling file systems hit linux.

The only kernel panic I've ever had was when playing with a
development version of the kernel (2.3.x). Never played with
development kernels since then - I'm a user, not a developer.

You apparently don't "get out enough;" while Linux is certainly a lot
more reliable than systems that need to be rebooted every few days so
that they don't spontaneously reboot, perfection is not to be had:

1. Flakey hardware can _always_ take things down.

A buggy video card and/or X driver can and will take systems down
in a flash. (And this problem shouldn't leave *BSD folk feeling
comfortable; they have no "silver bullet" against this
problem...)

2. Devices that pretend to be SCSI devices have a history of being
troublesome. I have encountered kernel panics as a result of
IDE-CDROMs, USB memory card readers, and the USB Palm interface
going 'flakey.'

3. There's an oft-heavily-loaded system that I have been working with
that has occasionally kernel paniced. Haven't been able to get
enough error messages out of it to track it down.

Note that none of these scenarios have anything to do with
"developmen t kernels;" in ALL these cases, I have experienced the
problems when running "production " kernels.

There have been times when I have tracked "bleeding edge" kernels; I
never, in those times, experienced data loss, although there have,
historically, been experimental versions which did break so badly as
to trash filesystems.

I have seen a LOT more kernel panics in "production " versions than in
"experiment al" versions, personally; the notion that avoiding "dev"
kernels will eliminate kernel panics is just fantasy.

Production kernels can't prevent disk hardware from being flakey;
that, alone, is point enough.
--
let name="cbbrowne" and tld="libertyrms .info" in String.concat "@" [name;tld];;
<http://dev6.int.libert yrms.com/>
Christopher Browne
(416) 646 3304 x124 (land)

Nov 12 '05 #10

Similar topics

4427

DB2 UDB recovery

by: Neil Truby | last post by:

There's something clearly missing in my understanding of recovery: I set up a small sample datavase and deleted all the rows from a table. Crucially, I omitted the "commit". I then shut down db2, and copied everything under the db2inst1/db2inst1/NODE0000 directory to another server. When I restarted db2 on the first server the rows were still missing. But on the second server they were still there :-( I repeated the exercise several...

DB2 Database

9526

crash recovery

by: xixi | last post by:

i have db2 udb v8.1 on windows 64 bit 2003 server, after db2 server start , i found this in the db2diag.log, is this error? 2004-05-05-15.28.30.780000 Instance:DB2 Node:000 PID:1692(db2syscs.exe) TID:2860 Appid:AC10040A.GD5F.00FC56D8BEC5 base sys utilities sqledint Probe:30 Crash Recovery is needed. 2004-05-05-15.28.31.890000 Instance:DB2 Node:000

DB2 Database

2625

container recovery

by: jignesh shah | last post by:

Hi all, Is there a way to recover a single container if its been corrupted or mark bad without restoring whole tablespace? environment: db28.1/aix5.1/tsm/rs-6000. Regards Jignesh

DB2 Database

2506

Online backup to restore onto another system (or for disaster recovery): how to?

by: Konstantin Andreev | last post by:

Hello, everybody. I've spent a lot of time reading "DB2 Information Center" and Raul Chong's book "Understanding DB2. Learning Visually with Examples", but still unable to answer this simple question. I need to perform the full database backup for at least these two goals: - to restore data onto developers' database system. - to ensure disaster recovery

DB2 Database

3637

Crash recovery

by: Racerx | last post by:

Hi All : I use db2 8.1 fixpack 3 on AIX. I recieved the following message in the diaglog ====================================================== ADM7513W Database manager has started. 2007-01-13-18.55.08.262174 Instance:db2inst1 Node:000 PID:467078(db2agent (mumar) 0) TID:1 Appid:GA010302.O03F.01101B9A3444 base sys utilities sqledint Probe:30

DB2 Database

1773

Disaster recovery in MS Access

by: sureshabi | last post by:

Hello All, I am on a mission critical MS ACCESS database, which has runout of storage space of 2GB. I need to know the following to do a diaster recovery of the existing situation if situation goes bad to worse. 1. How can i Purge Old Data? 2. How can i do nightly backup to the network 3. How can i stablize the current database? 4. What are the pitfalls of storing Image in MS Access database? Anyhelp to website links, greatly...

Microsoft Access / VBA

1859

PLEASE SOMEBODY KINDLY HELP ME: How I can make recovery CDs from my USB HDD?

by: Tin | last post by:

I bought a laptop and burned 4 recovery CDs for recovery purpose. Instead of burning as disc images, I just copied and pasted these 4 CDs to my USB HDD as 4 folders called "RecoveryCD 1", "RecoveryCD 2", "RecoveryCD 3" and "RecoveryCD 4". Now my laptop got problem and I lost my 4 recovery CDs. All I have now is 4 recovery folders in my USB HDD. I burned another 4 CDs as data discs from my USB HDD, but it didn't work out (it didn't boot...

PHP

1269

DB2 UDB Recovery after Total Media Failure or Server Loss due to Extreme Disaster

by: APP1MVF | last post by:

First let me start by saying I am not a DBA nor do I claim to be one, ok with that out of the way I am looking for some help understanding DB2 UDB Recovery after Total Media Failure or Server Loss due to Extreme Disaster. Currently on my system hot backups are being run on a daily basis with cold backups being executed during a down window on the weekend. The backup data is sent to tape which is then sent off site for safety. Hypothetically,...

DB2 Database

1788

full recovery disk

by: =?Utf-8?B?c3BhcmtsZWJhbg==?= | last post by:

My recovery disk on vista is almost full. I have performed a back up, deleted all but the most recent recovery point, done a disk clean up and also compressed the recovery disk. It is STILL almost full. What do I do? Can I (and how do I) delete the recovery disk? HELP! -- sparkleban

.NET Framework

8432

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

8343

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

8856

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

8545

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

8633

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

6185

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

4179

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

4347

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

1747

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

General