corruption diag/recovery, pg_dump crash

Ed L.

We are seeing what looks like pgsql data file corruption across multiple
clusters on a RAID5 partition on a single redhat linux 2.4 server running
7.3.4. System has ~20 clusters installed with a mix of 7.2.3, 7.3.2, and
7.3.4 (mostly 7.3.4), 10gb ram, 76gb on a RAID5, dual cpus, and very busy
with hundreds and sometimes > 1000 simultaneous connections. After ~250
days of continuous, flawless uptime operations, we recently began seeing
major performance degradation accompanied by messages like the following:

ERROR: Invalid page header in block NN of some_relation (10-15 instances)

ERROR: XLogFlush: request 38/5E659BA0 is not satisfied ... (1 instance
repeated many times)

I think I've been able to repair most of the "Invalid page header" errors by
rebuilding indices or truncating/reloading tabledata. The XLogFlush error
was occuring for a particular index, and a drop/reload has at least ceased
that error. Now, a pg_dump error is occurring on one cluster preventing a
successful dump. Of course, it's gone unnoticed long enough to rollover
our good online backups and the bazillion-dollar offline/offsite backup
system wasn't working properly. Here's the pg_dump output, edited to
protect the guilty:

pg_dump: PANIC: open of .../data/pg_clog/04E5 failed: No such file or
directory
pg_dump: lost synchronization with server, resetting connection
pg_dump: WARNING: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend
died abnormally and possibly corrupted ... blah blah
pg_dump: SQL command to dump the contents of table "sometable" failed:
PQendcopy() failed.
pg_dump: Error message from server: server closed the connection
unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_dump: The command was: COPY public.sometable ("key", ...) TO stdout;
pg_dumpall: pg_dump failed on somedb, exiting

Why that 04E5 file is missing, I haven't a clue. I've attached an "ls -l"
for the pg_clog dir.

Past list discussions suggest this may be an elusive hardware issue. We did
find a msg in /var/log/messages...

kernel: ISR called reentrantly!!

which some here have found newsgroup reports of connection to some sort of
raid/bios issue. We've taken the machine offline and conducted extensive
hardware diagnostics on RAID controller, filesystem (fsck), RAM, and found
no further indication of hardware failure. The machine had run flawlessly
for these ~20 clusters for ~250 days until cratering yesterday amidst these
errors and absurd system (disk) IO sluggishness. Upon reboot and upgrades,
the machine continues to exhibit infrequent corruption (or infrequently
discovered). Based on hardware vendor (Dell) support folks, we've upgraded
our kernel (now 2.4.20-24.7bigmem), several drivers, raid controller
firmware, rebooted, etc. The disk IO sluggishness has largely diminished,
but we're still seeing the Invalid page header pop-up anew, albeit
infrequently. The XLogFlush error seems to have gone away with the
reconstruction of an index.

Current plan is to get as much data recovered as possible, and then do
significant hardware replacements (along with more frequent planned reboots
and more vigilant backups).

Any clues/suggestions for recovering this data or fixing other issues would
be greatly appreciated.

TIA.
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 12 '05 #1

Subscribe Post Reply

4808

Ed L.

Maybe worth mentioning the system has one 7.2.3 cluster, five 7.3.2
clusters, twelve 7.3.4 clusters, all with data on same partition/device,
and all corruption has occurred on only five of the twelve 7.3.4 clusters.

TIA.

On Saturday December 6 2003 2:30, Ed L. wrote:

We are seeing what looks like pgsql data file corruption across multiple
clusters on a RAID5 partition on a single redhat linux 2.4 server running
7.3.4. System has ~20 clusters installed with a mix of 7.2.3, 7.3.2, and
7.3.4 (mostly 7.3.4), 10gb ram, 76gb on a RAID5, dual cpus, and very busy
with hundreds and sometimes > 1000 simultaneous connections. After ~250
days of continuous, flawless uptime operations, we recently began seeing
major performance degradation accompanied by messages like the following:

ERROR: Invalid page header in block NN of some_relation (10-15
instances)

ERROR: XLogFlush: request 38/5E659BA0 is not satisfied ... (1 instance
repeated many times)

I think I've been able to repair most of the "Invalid page header" errors
by rebuilding indices or truncating/reloading tabledata. The XLogFlush
error was occuring for a particular index, and a drop/reload has at least
ceased that error. Now, a pg_dump error is occurring on one cluster
preventing a successful dump. Of course, it's gone unnoticed long enough
to rollover our good online backups and the bazillion-dollar
offline/offsite backup system wasn't working properly. Here's the
pg_dump output, edited to protect the guilty:

pg_dump: PANIC: open of .../data/pg_clog/04E5 failed: No such file or
directory
pg_dump: lost synchronization with server, resetting connection
pg_dump: WARNING: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend
died abnormally and possibly corrupted ... blah blah
pg_dump: SQL command to dump the contents of table "sometable" failed:
PQendcopy() failed.
pg_dump: Error message from server: server closed the connection
unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_dump: The command was: COPY public.sometable ("key", ...) TO stdout;
pg_dumpall: pg_dump failed on somedb, exiting

Why that 04E5 file is missing, I haven't a clue. I've attached an "ls
-l" for the pg_clog dir.

Past list discussions suggest this may be an elusive hardware issue. We
did find a msg in /var/log/messages...

kernel: ISR called reentrantly!!

which some here have found newsgroup reports of connection to some sort
of raid/bios issue. We've taken the machine offline and conducted
extensive hardware diagnostics on RAID controller, filesystem (fsck),
RAM, and found no further indication of hardware failure. The machine
had run flawlessly for these ~20 clusters for ~250 days until cratering
yesterday amidst these errors and absurd system (disk) IO sluggishness.
Upon reboot and upgrades, the machine continues to exhibit infrequent
corruption (or infrequently discovered). Based on hardware vendor (Dell)
support folks, we've upgraded our kernel (now 2.4.20-24.7bigmem), several
drivers, raid controller firmware, rebooted, etc. The disk IO
sluggishness has largely diminished, but we're still seeing the Invalid
page header pop-up anew, albeit infrequently. The XLogFlush error seems
to have gone away with the reconstruction of an index.

Current plan is to get as much data recovered as possible, and then do
significant hardware replacements (along with more frequent planned
reboots and more vigilant backups).

Any clues/suggestions for recovering this data or fixing other issues
would be greatly appreciated.

TIA.

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 12 '05 #2

Martijn van Oosterhout

While I can't help you with most of your message, the pg_clog is an easier
one. Basically, creating a file with that name with 256KB of zero will let
postgres complete the dump.

*HOWEVER*, what this means is that one of the tuple headers in the database
refers to a nonexistant transaction. So that is definitly some kind of
corruption going on there.

Hope this helps,

On Sat, Dec 06, 2003 at 02:30:37PM -0700, Ed L. wrote:

We are seeing what looks like pgsql data file corruption across multiple
clusters on a RAID5 partition on a single redhat linux 2.4 server running
7.3.4. System has ~20 clusters installed with a mix of 7.2.3, 7.3.2, and
7.3.4 (mostly 7.3.4), 10gb ram, 76gb on a RAID5, dual cpus, and very busy
with hundreds and sometimes > 1000 simultaneous connections. After ~250
days of continuous, flawless uptime operations, we recently began seeing
major performance degradation accompanied by messages like the following:

ERROR: Invalid page header in block NN of some_relation (10-15 instances)

ERROR: XLogFlush: request 38/5E659BA0 is not satisfied ... (1 instance
repeated many times)

I think I've been able to repair most of the "Invalid page header" errorsby
rebuilding indices or truncating/reloading tabledata. The XLogFlush error
was occuring for a particular index, and a drop/reload has at least ceased
that error. Now, a pg_dump error is occurring on one cluster preventing a
successful dump. Of course, it's gone unnoticed long enough to rollover
our good online backups and the bazillion-dollar offline/offsite backup
system wasn't working properly. Here's the pg_dump output, edited to
protect the guilty:

pg_dump: PANIC: open of .../data/pg_clog/04E5 failed: No such file or
directory
pg_dump: lost synchronization with server, resetting connection
pg_dump: WARNING: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend
died abnormally and possibly corrupted ... blah blah
pg_dump: SQL command to dump the contents of table "sometable" failed:
PQendcopy() failed.
pg_dump: Error message from server: server closed the connection
unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_dump: The command was: COPY public.sometable ("key", ...) TO stdout;
pg_dumpall: pg_dump failed on somedb, exiting

Why that 04E5 file is missing, I haven't a clue. I've attached an "ls -l"
for the pg_clog dir.

Past list discussions suggest this may be an elusive hardware issue. We did
find a msg in /var/log/messages...

kernel: ISR called reentrantly!!

which some here have found newsgroup reports of connection to some sort of
raid/bios issue. We've taken the machine offline and conducted extensive
hardware diagnostics on RAID controller, filesystem (fsck), RAM, and found
no further indication of hardware failure. The machine had run flawlessly
for these ~20 clusters for ~250 days until cratering yesterday amidst these
errors and absurd system (disk) IO sluggishness. Upon reboot and upgrades,
the machine continues to exhibit infrequent corruption (or infrequently
discovered). Based on hardware vendor (Dell) support folks, we've upgraded
our kernel (now 2.4.20-24.7bigmem), several drivers, raid controller
firmware, rebooted, etc. The disk IO sluggishness has largely diminished,
but we're still seeing the Invalid page header pop-up anew, albeit
infrequently. The XLogFlush error seems to have gone away with the
reconstruction of an index.

Current plan is to get as much data recovered as possible, and then do
significant hardware replacements (along with more frequent planned reboots
and more vigilant backups).

Any clues/suggestions for recovering this data or fixing other issues would
be greatly appreciated.

TIA. total 64336
-rw------- 1 pgdba pg 262144 Aug 12 18:39 0000
-rw------- 1 pgdba pg 262144 Aug 14 11:56 0001
-rw------- 1 pgdba pg 262144 Aug 14 20:22 0002
-rw------- 1 pgdba pg 262144 Aug 15 16:01 0003
-rw------- 1 pgdba pg 262144 Aug 15 23:08 0004
-rw------- 1 pgdba pg 262144 Aug 16 05:33 0005
-rw------- 1 pgdba pg 262144 Aug 16 11:42 0006
-rw------- 1 pgdba pg 262144 Aug 16 18:25 0007
-rw------- 1 pgdba pg 262144 Aug 16 23:57 0008
-rw------- 1 pgdba pg 262144 Aug 17 08:16 0009
-rw------- 1 pgdba pg 262144 Aug 17 14:31 000A
-rw------- 1 pgdba pg 262144 Aug 17 20:24 000B
-rw------- 1 pgdba pg 262144 Aug 17 23:57 000C
-rw------- 1 pgdba pg 262144 Aug 18 03:33 000D
-rw------- 1 pgdba pg 262144 Aug 18 13:01 000E
-rw------- 1 pgdba pg 262144 Aug 19 13:03 000F
-rw------- 1 pgdba pg 262144 Aug 19 18:54 0010
-rw------- 1 pgdba pg 262144 Aug 19 23:19 0011
-rw------- 1 pgdba pg 262144 Aug 20 04:29 0012
-rw------- 1 pgdba pg 262144 Aug 20 12:50 0013
-rw------- 1 pgdba pg 262144 Aug 20 15:00 0014
-rw------- 1 pgdba pg 262144 Aug 20 23:29 0015
-rw------- 1 pgdba pg 262144 Aug 21 11:50 0016
-rw------- 1 pgdba pg 262144 Aug 21 16:36 0017
-rw------- 1 pgdba pg 262144 Aug 21 21:36 0018
-rw------- 1 pgdba pg 262144 Aug 22 03:24 0019
-rw------- 1 pgdba pg 262144 Aug 22 09:16 001A
-rw------- 1 pgdba pg 262144 Aug 22 15:59 001B
-rw------- 1 pgdba pg 262144 Aug 23 06:39 001C
-rw------- 1 pgdba pg 262144 Aug 24 01:10 001D
-rw------- 1 pgdba pg 262144 Aug 24 15:53 001E
-rw------- 1 pgdba pg 262144 Aug 25 09:54 001F
-rw------- 1 pgdba pg 262144 Aug 25 14:37 0020
-rw------- 1 pgdba pg 262144 Aug 26 01:29 0021
-rw------- 1 pgdba pg 262144 Aug 26 13:13 0022
-rw------- 1 pgdba pg 262144 Aug 26 18:26 0023
-rw------- 1 pgdba pg 262144 Aug 27 10:14 0024
-rw------- 1 pgdba pg 262144 Aug 27 17:10 0025
-rw------- 1 pgdba pg 262144 Aug 28 08:31 0026
-rw------- 1 pgdba pg 262144 Aug 28 15:21 0027
-rw------- 1 pgdba pg 262144 Aug 29 06:11 0028
-rw------- 1 pgdba pg 262144 Aug 29 13:56 0029
-rw------- 1 pgdba pg 262144 Aug 30 03:51 002A
-rw------- 1 pgdba pg 262144 Aug 30 17:15 002B
-rw------- 1 pgdba pg 262144 Aug 31 11:31 002C
-rw------- 1 pgdba pg 262144 Sep 1 04:59 002D
-rw------- 1 pgdba pg 262144 Sep 1 17:01 002E
-rw------- 1 pgdba pg 262144 Sep 2 09:52 002F
-rw------- 1 pgdba pg 262144 Sep 2 16:24 0030
-rw------- 1 pgdba pg 262144 Sep 3 07:07 0031
-rw------- 1 pgdba pg 262144 Sep 3 13:27 0032
-rw------- 1 pgdba pg 262144 Sep 4 04:25 0033
-rw------- 1 pgdba pg 262144 Sep 4 13:11 0034
-rw------- 1 pgdba pg 262144 Sep 5 02:11 0035
-rw------- 1 pgdba pg 262144 Sep 5 12:31 0036
-rw------- 1 pgdba pg 262144 Sep 6 01:18 0037
-rw------- 1 pgdba pg 262144 Sep 6 17:12 0038
-rw------- 1 pgdba pg 262144 Sep 7 12:01 0039
-rw------- 1 pgdba pg 262144 Sep 8 08:00 003A
-rw------- 1 pgdba pg 262144 Sep 8 14:32 003B
-rw------- 1 pgdba pg 262144 Sep 9 06:14 003C
-rw------- 1 pgdba pg 262144 Sep 9 13:12 003D
-rw------- 1 pgdba pg 262144 Sep 9 20:56 003E
-rw------- 1 pgdba pg 262144 Sep 10 09:26 003F
-rw------- 1 pgdba pg 262144 Sep 10 14:27 0040
-rw------- 1 pgdba pg 262144 Sep 10 20:29 0041
-rw------- 1 pgdba pg 262144 Sep 11 03:29 0042
-rw------- 1 pgdba pg 262144 Sep 11 12:00 0043
-rw------- 1 pgdba pg 262144 Sep 11 20:27 0044
-rw------- 1 pgdba pg 262144 Sep 12 09:01 0045
-rw------- 1 pgdba pg 262144 Sep 12 15:37 0046
-rw------- 1 pgdba pg 262144 Sep 13 07:29 0047
-rw------- 1 pgdba pg 262144 Sep 13 18:59 0048
-rw------- 1 pgdba pg 262144 Sep 14 12:05 0049
-rw------- 1 pgdba pg 262144 Sep 15 07:17 004A
-rw------- 1 pgdba pg 262144 Sep 15 13:53 004B
-rw------- 1 pgdba pg 262144 Sep 16 01:09 004C
-rw------- 1 pgdba pg 262144 Sep 16 11:18 004D
-rw------- 1 pgdba pg 262144 Sep 16 18:46 004E
-rw------- 1 pgdba pg 262144 Sep 17 09:17 004F
-rw------- 1 pgdba pg 262144 Sep 17 16:45 0050
-rw------- 1 pgdba pg 262144 Sep 18 07:39 0051
-rw------- 1 pgdba pg 262144 Sep 18 14:20 0052
-rw------- 1 pgdba pg 262144 Sep 19 01:38 0053
-rw------- 1 pgdba pg 262144 Sep 19 12:05 0054
-rw------- 1 pgdba pg 262144 Sep 19 22:39 0055
-rw------- 1 pgdba pg 262144 Sep 20 13:55 0056
-rw------- 1 pgdba pg 262144 Sep 21 09:02 0057
-rw------- 1 pgdba pg 262144 Sep 22 02:47 0058
-rw------- 1 pgdba pg 262144 Sep 22 12:42 0059
-rw------- 1 pgdba pg 262144 Sep 22 21:57 005A
-rw------- 1 pgdba pg 262144 Sep 23 10:28 005B
-rw------- 1 pgdba pg 262144 Sep 23 18:00 005C
-rw------- 1 pgdba pg 262144 Sep 24 08:52 005D
-rw------- 1 pgdba pg 262144 Sep 24 15:14 005E
-rw------- 1 pgdba pg 262144 Sep 2504:16 005F
-rw------- 1 pgdba pg 262144 Sep 25 12:17 0060
-rw------- 1 pgdba pg 262144 Sep 25 20:17 0061
-rw------- 1 pgdba pg 262144 Sep 26 10:07 0062
-rw------- 1 pgdba pg 262144 Sep 26 16:24 0063
-rw------- 1 pgdba pg 262144 Sep 27 09:20 0064
-rw------- 1 pgdba pg 262144 Sep 28 00:27 0065
-rw------- 1 pgdba pg 262144 Sep 28 16:17 0066
-rw------- 1 pgdba pg 262144 Sep 29 09:45 0067
-rw------- 1 pgdba pg 262144 Sep 29 16:37 0068
-rw------- 1 pgdba pg 262144 Sep 30 07:44 0069
-rw------- 1 pgdba pg 262144 Sep 30 15:03 006A
-rw------- 1 pgdba pg 262144 Oct 1 05:59 006B
-rw------- 1 pgdba pg 262144 Oct 1 12:52 006C
-rw------- 1 pgdba pg 262144 Oct 1 22:19 006D
-rw------- 1 pgdba pg 262144 Oct 2 10:53 006E
-rw------- 1 pgdba pg 262144 Oct 2 19:28 006F
-rw------- 1 pgdba pg 262144 Oct 3 10:18 0070
-rw------- 1 pgdba pg 262144 Oct 3 19:11 0071
-rw------- 1 pgdba pg 262144 Oct 4 12:42 0072
-rw------- 1 pgdba pg 262144 Oct 5 08:24 0073
-rw------- 1 pgdba pg 262144 Oct 6 00:03 0074
-rw------- 1 pgdba pg 262144 Oct 6 11:57 0075
-rw------- 1 pgdba pg 262144 Oct 6 19:46 0076
-rw------- 1 pgdba pg 262144 Oct 7 09:43 0077
-rw------- 1 pgdba pg 262144 Oct 7 17:09 0078
-rw------- 1 pgdba pg 262144 Oct 8 07:33 0079
-rw------- 1 pgdba pg 262144 Oct 8 13:34 007A
-rw------- 1 pgdba pg 262144 Oct 8 18:41 007B
-rw------- 1 pgdba pg 262144 Oct 8 23:28 007C
-rw------- 1 pgdba pg 262144 Oct 9 09:51 007D
-rw------- 1 pgdba pg 262144 Oct 9 14:22 007E
-rw------- 1 pgdba pg 262144 Oct 9 17:04 007F
-rw------- 1 pgdba pg 262144 Oct 10 06:56 0080
-rw------- 1 pgdba pg 262144 Oct 10 12:31 0081
-rw------- 1 pgdba pg 262144 Oct 10 18:19 0082
-rw------- 1 pgdba pg 262144 Oct 11 10:22 0083
-rw------- 1 pgdba pg 262144 Oct 12 02:29 0084
-rw------- 1 pgdba pg 262144 Oct 12 17:43 0085
-rw------- 1 pgdba pg 262144 Oct 13 09:49 0086
-rw------- 1 pgdba pg 262144 Oct 13 17:00 0087
-rw------- 1 pgdba pg 262144 Oct 14 07:48 0088
-rw------- 1 pgdba pg 262144 Oct 14 12:49 0089
-rw------- 1 pgdba pg 262144 Oct 14 16:48 008A
-rw------- 1 pgdba pg 262144 Oct 15 07:33 008B
-rw------- 1 pgdba pg 262144 Oct 15 14:30 008C
-rw------- 1 pgdba pg 262144 Oct 16 01:41 008D
-rw------- 1 pgdba pg 262144 Oct 16 12:30 008E
-rw------- 1 pgdba pg 262144 Oct 16 20:30 008F
-rw------- 1 pgdba pg 262144 Oct 17 10:32 0090
-rw------- 1 pgdba pg 262144 Oct 17 17:38 0091
-rw------- 1 pgdba pg 262144 Oct 18 10:25 0092
-rw------- 1 pgdba pg 262144 Oct 19 01:53 0093
-rw------- 1 pgdba pg 262144 Oct 19 16:38 0094
-rw------- 1 pgdba pg 262144 Oct 20 09:23 0095
-rw------- 1 pgdba pg 262144 Oct 20 16:40 0096
-rw------- 1 pgdba pg 262144 Oct 21 07:08 0097
-rw------- 1 pgdba pg 262144 Oct 21 13:31 0098
-rw------- 1 pgdba pg 262144 Oct 21 21:56 0099
-rw------- 1 pgdba pg 262144 Oct 22 10:02 009A
-rw------- 1 pgdba pg 262144 Oct 22 16:31 009B
-rw------- 1 pgdba pg 262144 Oct 22 22:59 009C
-rw------- 1 pgdba pg 262144 Oct 23 10:46 009D
-rw------- 1 pgdba pg 262144 Oct 23 17:20 009E
-rw------- 1 pgdba pg 262144 Oct 24 08:25 009F
-rw------- 1 pgdba pg 262144 Oct 24 14:48 00A0
-rw------- 1 pgdba pg 262144 Oct 25 05:45 00A1
-rw------- 1 pgdba pg 262144 Oct 25 20:22 00A2
-rw------- 1 pgdba pg 262144 Oct 26 13:16 00A3
-rw------- 1 pgdba pg 262144 Oct 27 07:34 00A4
-rw------- 1 pgdba pg 262144 Oct 27 13:54 00A5
-rw------- 1 pgdba pg 262144 Oct 28 03:14 00A6
-rw------- 1 pgdba pg 262144 Oct 28 11:58 00A7
-rw------- 1 pgdba pg 262144 Oct 28 19:36 00A8
-rw------- 1 pgdba pg 262144 Oct 29 09:39 00A9
-rw------- 1 pgdba pg 262144 Oct 29 16:27 00AA
-rw------- 1 pgdba pg 262144 Oct 30 07:23 00AB
-rw------- 1 pgdba pg 262144 Oct 30 13:43 00AC
-rw------- 1 pgdba pg 262144 Oct 31 02:31 00AD
-rw------- 1 pgdba pg 262144 Oct 31 11:59 00AE
-rw------- 1 pgdba pg 262144 Oct 31 19:54 00AF
-rw------- 1 pgdba pg 262144 Nov 1 13:44 00B0
-rw------- 1 pgdba pg 262144 Nov 2 08:26 00B1
-rw------- 1 pgdba pg 262144 Nov 2 20:59 00B2
-rw------- 1 pgdba pg 262144 Nov 3 10:33 00B3
-rw------- 1 pgdba pg 262144 Nov 3 17:21 00B4
-rw------- 1 pgdba pg 262144 Nov 4 09:01 00B5
-rw------- 1 pgdba pg 262144 Nov 4 14:44 00B6
-rw------- 1 pgdba pg 262144 Nov 5 06:33 00B7
-rw------- 1 pgdba pg 262144 Nov 5 13:17 00B8
-rw------- 1 pgdba pg 262144 Nov 5 20:45 00B9
-rw------- 1 pgdba pg 262144 Nov 6 09:45 00BA
-rw------- 1 pgdba pg 262144 Nov 6 17:04 00BB
-rw------- 1 pgdba pg 262144 Nov 7 06:55 00BC
-rw------- 1 pgdba pg 262144 Nov 7 13:31 00BD
-rw------- 1 pgdba pg 262144 Nov 8 03:58 00BE
-rw------- 1 pgdba pg 262144 Nov 8 17:04 00BF
-rw------- 1 pgdba pg 262144 Nov 9 11:14 00C0
-rw------- 1 pgdba pg 262144 Nov 10 06:16 00C1
-rw------- 1 pgdba pg 262144 Nov 10 12:47 00C2
-rw------- 1 pgdba pg 262144 Nov 10 21:18 00C3
-rw------- 1 pgdba pg 262144 Nov 11 10:34 00C4
-rw------- 1 pgdba pg 262144 Nov 11 17:23 00C5
-rw------- 1 pgdba pg 262144 Nov 12 09:15 00C6
-rw------- 1 pgdba pg 262144 Nov 12 15:03 00C7
-rw------- 1 pgdba pg 262144 Nov 13 06:30 00C8
-rw------- 1 pgdba pg 262144 Nov 13 13:56 00C9
-rw------- 1 pgdba pg 262144 Nov 14 00:38 00CA
-rw------- 1 pgdba pg 262144 Nov 14 13:06 00CB
-rw------- 1 pgdba pg 262144 Nov 14 21:27 00CC
-rw------- 1 pgdba pg 262144 Nov 15 13:25 00CD
-rw------- 1 pgdba pg 262144 Nov 16 08:57 00CE
-rw------- 1 pgdba pg 262144 Nov 16 23:22 00CF
-rw------- 1 pgdba pg 262144 Nov 17 11:49 00D0
-rw------- 1 pgdba pg 262144 Nov 17 20:12 00D1
-rw------- 1 pgdba pg 262144 Nov 18 09:10 00D2
-rw------- 1 pgdba pg 262144 Nov 18 16:02 00D3
-rw------- 1 pgdba pg 262144 Nov 19 05:23 00D4
-rw------- 1 pgdba pg 262144 Nov 19 12:27 00D5
-rw------- 1 pgdba pg 262144 Nov 19 19:22 00D6
-rw------- 1 pgdba pg 262144 Nov 20 10:36 00D7
-rw------- 1 pgdba pg 262144 Nov 20 16:40 00D8
-rw------- 1 pgdba pg 262144 Nov 21 08:19 00D9
-rw------- 1 pgdba pg 262144 Nov 21 14:53 00DA
-rw------- 1 pgdba pg 262144 Nov 22 05:41 00DB
-rw------- 1 pgdba pg 262144 Nov 22 19:28 00DC
-rw------- 1 pgdba pg 262144 Nov 23 12:30 00DD
-rw------- 1 pgdba pg 262144 Nov 24 07:24 00DE
-rw------- 1 pgdba pg 262144 Nov 24 14:18 00DF
-rw------- 1 pgdba pg 262144 Nov 25 02:03 00E0
-rw------- 1 pgdba pg 262144 Nov 25 11:47 00E1
-rw------- 1 pgdba pg 262144 Nov 25 18:46 00E2
-rw------- 1 pgdba pg 262144 Nov 26 09:57 00E3
-rw------- 1 pgdba pg 262144 Nov 26 17:09 00E4
-rw------- 1 pgdba pg 262144 Nov 27 11:48 00E5
-rw------- 1 pgdba pg 262144 Nov 28 07:43 00E6
-rw------- 1 pgdba pg 262144 Nov 28 16:12 00E7
-rw------- 1 pgdba pg 262144 Nov 29 09:02 00E8
-rw------- 1 pgdba pg 262144 Nov 30 01:06 00E9
-rw------- 1 pgdba pg 262144 Nov 30 16:51 00EA
-rw------- 1 pgdba pg 262144 Dec 1 09:23 00EB
-rw------- 1 pgdba pg 262144 Dec 1 17:05 00EC
-rw------- 1 pgdba pg 262144 Dec 2 07:24 00ED
-rw------- 1 pgdba pg 262144 Dec 2 14:19 00EE
-rw------- 1 pgdba pg 262144 Dec 3 03:52 00EF
-rw------- 1 pgdba pg 262144 Dec 3 12:51 00F0
-rw------- 1 pgdba pg 262144 Dec 3 22:34 00F1
-rw------- 1 pgdba pg 262144 Dec 4 10:46 00F2
-rw------- 1 pgdba pg 262144 Dec 4 17:20 00F3
-rw------- 1 pgdba pg 262144 Dec 5 11:34 00F4
-rw------- 1 pgdba pg 262144 Dec 6 00:23 00F5
-rw------- 1 pgdba pg 262144 Dec 6 11:07 00F6
-rw------- 1 pgdba pg 114688 Dec 6 16:10 00F7
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

--
Martijn van Oosterhout <kl*****@svana.org> http://svana.org/kleptog/ "All that is needed for the forces of evil to triumph is for enough good
men to do nothing." - Edmond Burke
"The penalty good people pay for not being interested in politics is to be
governed by people worse than themselves." - Plato

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE/0mC2Y5Twig3Ge+YRAvBuAJ9k3Ap7beYUN2wcbhGrTXe/AKTZ+gCgxrcq
ciZ+e+VoBvFLHLHbeoO2M28=
=5+9M
-----END PGP SIGNATURE-----

Nov 12 '05 #3

Tom Lane

"Ed L." <pg***@bluepolka.net> writes:

Here's the pg_dump output, edited to=20
protect the guilty: pg_dump: PANIC: open of .../data/pg_clog/04E5 failed: No such file or=20
directory

Given that this is far away from the range of valid clog segment names,
it seems safe to say that it's a symptom of a corrupted tuple header
(specifically, a whacked-out transaction ID number in some tuple
header).

You could probably track down the bad row (if there's only one or a few)
by expedients like seeing how far "SELECT ... FROM sometable LIMIT n"
will go without crashing. Once you have identified where the bad row is
located, you could try to repair it, or just zero out the whole page if
you're willing to lose the other rows on the same page. I would be
interested to see a pg_filedump dump of the corrupted page, if you go as
far as finding it.

(There are previous discussions of coping with corrupted data in the
mailing list archives. Searching for references to pg_filedump should
turn up some useful threads.)

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 12 '05 #4

Ed L.

On Monday December 8 2003 6:55, Ed L. wrote:

On Saturday December 6 2003 4:43, Tom Lane wrote:
"Ed L." <pg***@bluepolka.net> writes:
Here's the pg_dump output, edited to=20
protect the guilty:

pg_dump: PANIC: open of .../data/pg_clog/04E5 failed: No such file
or=20 directory

Given that this is far away from the range of valid clog segment names,
it seems safe to say that it's a symptom of a corrupted tuple header
(specifically, a whacked-out transaction ID number in some tuple
header).

I moved PGDATA to a new system due to catastrophic hardware failures
(media and data errors on RAID5 + operator error when a tech pulled a
hotswap disk without failing the drive first). Now I am finally getting
a good look at the corruption (which appears to have moved around during
the scp):

$ psql -c "\d misc"
ERROR: _mdfd_getrelnfd: cannot open relation pg_depend_depender_index:
No such file or directory

And note this from .../data/base/28607376:

$ oid2name -d mydb -t pg_depend_depender_index
Oid of table pg_depend_depender_index from database "mydb":
---------------------------------
16622 = pg_depend_depender_index
$ ls -l 16622
ls: 16622: No such file or directory

Any clues as to first steps at recovery? Recovering from backup is
unfortunately not a very viable option.

Ed
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 12 '05 #5

Tom Lane

"Ed L." <pg***@bluepolka.net> writes:

Now I am finally getting a good look at
the corruption (which appears to have moved around during the scp):

Hm. I don't see anything particularly exceptionable in pg_class page 11
--- rather a lot of dead tuples, but that's not proof of corruption.
To judge by your SELECT results, there are *no* live tuples in pg_class
between pages 11 and 543, and a bad page header in page 543. What do
you see if you ask pg_filedump to dump all that page range? (It'd be
a bit much to send to the list, but you can send it to me off-list.)

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 12 '05 #6

Tom Lane

Ed Loehr <ed@LoehrTech.com> writes:

This is pg_class; look at the ascii names on the right. I notice that one
name (misc_doctors) is repeated twice.

Sure, but one of those rows is committed dead. Looks like a perfectly
ordinary case of a not-yet-vacuumed update to me.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #7

Ed Loehr

On Monday December 8 2003 8:23, you wrote:

"Ed L." <pg***@bluepolka.net> writes:
Now I am finally getting a good look at
the corruption (which appears to have moved around during the scp):

Hm. I don't see anything particularly exceptionable in pg_class page 11
--- rather a lot of dead tuples, but that's not proof of corruption.
To judge by your SELECT results, there are *no* live tuples in pg_class
between pages 11 and 543, and a bad page header in page 543. What do
you see if you ask pg_filedump to dump all that page range? (It'd be
a bit much to send to the list, but you can send it to me off-list.)

This is pg_class; look at the ascii names on the right. I notice that one
name (misc_doctors) is repeated twice. We also have an error dumping that
table in which the end of the dump gets two BLANK tuples in the output,
causing the load to fail due to missing columns. Is it possible that we
have two pg_class tuples with same relname, and if so, is that a
corruption?

Will send full dump...

TIA
Ed
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 12 '05 #8

corruption diag/recovery, pg_dump crash

Similar topics