corruption diag/recovery, pg_dump crash

Ed L.

We are seeing what looks like pgsql data file corruption across multiple
clusters on a RAID5 partition on a single redhat linux 2.4 server running
7.3.4. System has ~20 clusters installed with a mix of 7.2.3, 7.3.2, and
7.3.4 (mostly 7.3.4), 10gb ram, 76gb on a RAID5, dual cpus, and very busy
with hundreds and sometimes > 1000 simultaneous connections. After ~250
days of continuous, flawless uptime operations, we recently began seeing
major performance degradation accompanied by messages like the following:

ERROR: Invalid page header in block NN of some_relation (10-15 instances)

ERROR: XLogFlush: request 38/5E659BA0 is not satisfied ... (1 instance
repeated many times)

I think I've been able to repair most of the "Invalid page header" errors by
rebuilding indices or truncating/reloading tabledata. The XLogFlush error
was occuring for a particular index, and a drop/reload has at least ceased
that error. Now, a pg_dump error is occurring on one cluster preventing a
successful dump. Of course, it's gone unnoticed long enough to rollover
our good online backups and the bazillion-dollar offline/offsite backup
system wasn't working properly. Here's the pg_dump output, edited to
protect the guilty:

pg_dump: PANIC: open of .../data/pg_clog/04E5 failed: No such file or
directory
pg_dump: lost synchronization with server, resetting connection
pg_dump: WARNING: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend
died abnormally and possibly corrupted ... blah blah
pg_dump: SQL command to dump the contents of table "sometable" failed:
PQendcopy() failed.
pg_dump: Error message from server: server closed the connection
unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_dump: The command was: COPY public.sometabl e ("key", ...) TO stdout;
pg_dumpall: pg_dump failed on somedb, exiting

Why that 04E5 file is missing, I haven't a clue. I've attached an "ls -l"
for the pg_clog dir.

Past list discussions suggest this may be an elusive hardware issue. We did
find a msg in /var/log/messages...

kernel: ISR called reentrantly!!

which some here have found newsgroup reports of connection to some sort of
raid/bios issue. We've taken the machine offline and conducted extensive
hardware diagnostics on RAID controller, filesystem (fsck), RAM, and found
no further indication of hardware failure. The machine had run flawlessly
for these ~20 clusters for ~250 days until cratering yesterday amidst these
errors and absurd system (disk) IO sluggishness. Upon reboot and upgrades,
the machine continues to exhibit infrequent corruption (or infrequently
discovered). Based on hardware vendor (Dell) support folks, we've upgraded
our kernel (now 2.4.20-24.7bigmem), several drivers, raid controller
firmware, rebooted, etc. The disk IO sluggishness has largely diminished,
but we're still seeing the Invalid page header pop-up anew, albeit
infrequently. The XLogFlush error seems to have gone away with the
reconstruction of an index.

Current plan is to get as much data recovered as possible, and then do
significant hardware replacements (along with more frequent planned reboots
and more vigilant backups).

Any clues/suggestions for recovering this data or fixing other issues would
be greatly appreciated.

TIA.
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 12 '05 #1

Subscribe Reply

4838

Ed L.

Maybe worth mentioning the system has one 7.2.3 cluster, five 7.3.2
clusters, twelve 7.3.4 clusters, all with data on same partition/device,
and all corruption has occurred on only five of the twelve 7.3.4 clusters.

TIA.

On Saturday December 6 2003 2:30, Ed L. wrote:

We are seeing what looks like pgsql data file corruption across multiple
clusters on a RAID5 partition on a single redhat linux 2.4 server running
7.3.4. System has ~20 clusters installed with a mix of 7.2.3, 7.3.2, and
7.3.4 (mostly 7.3.4), 10gb ram, 76gb on a RAID5, dual cpus, and very busy
with hundreds and sometimes > 1000 simultaneous connections. After ~250
days of continuous, flawless uptime operations, we recently began seeing
major performance degradation accompanied by messages like the following:

ERROR: Invalid page header in block NN of some_relation (10-15
instances)

ERROR: XLogFlush: request 38/5E659BA0 is not satisfied ... (1 instance
repeated many times)

I think I've been able to repair most of the "Invalid page header" errors
by rebuilding indices or truncating/reloading tabledata. The XLogFlush
error was occuring for a particular index, and a drop/reload has at least
ceased that error. Now, a pg_dump error is occurring on one cluster
preventing a successful dump. Of course, it's gone unnoticed long enough
to rollover our good online backups and the bazillion-dollar
offline/offsite backup system wasn't working properly. Here's the
pg_dump output, edited to protect the guilty:

pg_dump: PANIC: open of .../data/pg_clog/04E5 failed: No such file or
directory
pg_dump: lost synchronization with server, resetting connection
pg_dump: WARNING: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend
died abnormally and possibly corrupted ... blah blah
pg_dump: SQL command to dump the contents of table "sometable" failed:
PQendcopy() failed.
pg_dump: Error message from server: server closed the connection
unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_dump: The command was: COPY public.sometabl e ("key", ...) TO stdout;
pg_dumpall: pg_dump failed on somedb, exiting

Why that 04E5 file is missing, I haven't a clue. I've attached an "ls
-l" for the pg_clog dir.

Past list discussions suggest this may be an elusive hardware issue. We
did find a msg in /var/log/messages...

kernel: ISR called reentrantly!!

which some here have found newsgroup reports of connection to some sort
of raid/bios issue. We've taken the machine offline and conducted
extensive hardware diagnostics on RAID controller, filesystem (fsck),
RAM, and found no further indication of hardware failure. The machine
had run flawlessly for these ~20 clusters for ~250 days until cratering
yesterday amidst these errors and absurd system (disk) IO sluggishness.
Upon reboot and upgrades, the machine continues to exhibit infrequent
corruption (or infrequently discovered). Based on hardware vendor (Dell)
support folks, we've upgraded our kernel (now 2.4.20-24.7bigmem), several
drivers, raid controller firmware, rebooted, etc. The disk IO
sluggishness has largely diminished, but we're still seeing the Invalid
page header pop-up anew, albeit infrequently. The XLogFlush error seems
to have gone away with the reconstruction of an index.

Current plan is to get as much data recovered as possible, and then do
significant hardware replacements (along with more frequent planned
reboots and more vigilant backups).

Any clues/suggestions for recovering this data or fixing other issues
would be greatly appreciated.

TIA.

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 12 '05 #2

Martijn van Oosterhout

While I can't help you with most of your message, the pg_clog is an easier
one. Basically, creating a file with that name with 256KB of zero will let
postgres complete the dump.

*HOWEVER*, what this means is that one of the tuple headers in the database
refers to a nonexistant transaction. So that is definitly some kind of
corruption going on there.

Hope this helps,

On Sat, Dec 06, 2003 at 02:30:37PM -0700, Ed L. wrote:

We are seeing what looks like pgsql data file corruption across multiple
clusters on a RAID5 partition on a single redhat linux 2.4 server running
7.3.4. System has ~20 clusters installed with a mix of 7.2.3, 7.3.2, and
7.3.4 (mostly 7.3.4), 10gb ram, 76gb on a RAID5, dual cpus, and very busy
with hundreds and sometimes > 1000 simultaneous connections. After ~250
days of continuous, flawless uptime operations, we recently began seeing
major performance degradation accompanied by messages like the following:

ERROR: Invalid page header in block NN of some_relation (10-15 instances)

ERROR: XLogFlush: request 38/5E659BA0 is not satisfied ... (1 instance
repeated many times)

I think I've been able to repair most of the "Invalid page header" errorsby
rebuilding indices or truncating/reloading tabledata. The XLogFlush error
was occuring for a particular index, and a drop/reload has at least ceased
that error. Now, a pg_dump error is occurring on one cluster preventing a
successful dump. Of course, it's gone unnoticed long enough to rollover
our good online backups and the bazillion-dollar offline/offsite backup
system wasn't working properly. Here's the pg_dump output, edited to
protect the guilty:

pg_dump: PANIC: open of .../data/pg_clog/04E5 failed: No such file or
directory
pg_dump: lost synchronization with server, resetting connection
pg_dump: WARNING: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend
died abnormally and possibly corrupted ... blah blah
pg_dump: SQL command to dump the contents of table "sometable" failed:
PQendcopy() failed.
pg_dump: Error message from server: server closed the connection
unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_dump: The command was: COPY public.sometabl e ("key", ...) TO stdout;
pg_dumpall: pg_dump failed on somedb, exiting

Why that 04E5 file is missing, I haven't a clue. I've attached an "ls -l"
for the pg_clog dir.

Past list discussions suggest this may be an elusive hardware issue. We did
find a msg in /var/log/messages...

kernel: ISR called reentrantly!!

which some here have found newsgroup reports of connection to some sort of
raid/bios issue. We've taken the machine offline and conducted extensive
hardware diagnostics on RAID controller, filesystem (fsck), RAM, and found
no further indication of hardware failure. The machine had run flawlessly
for these ~20 clusters for ~250 days until cratering yesterday amidst these
errors and absurd system (disk) IO sluggishness. Upon reboot and upgrades,
the machine continues to exhibit infrequent corruption (or infrequently
discovered). Based on hardware vendor (Dell) support folks, we've upgraded
our kernel (now 2.4.20-24.7bigmem), several drivers, raid controller
firmware, rebooted, etc. The disk IO sluggishness has largely diminished,
but we're still seeing the Invalid page header pop-up anew, albeit
infrequently. The XLogFlush error seems to have gone away with the
reconstruction of an index.

Current plan is to get as much data recovered as possible, and then do
significant hardware replacements (along with more frequent planned reboots
and more vigilant backups).

Any clues/suggestions for recovering this data or fixing other issues would
be greatly appreciated.

TIA. total 64336
-rw------- 1 pgdba pg 262144 Aug 12 18:39 0000
-rw------- 1 pgdba pg 262144 Aug 14 11:56 0001
-rw------- 1 pgdba pg 262144 Aug 14 20:22 0002
-rw------- 1 pgdba pg 262144 Aug 15 16:01 0003
-rw------- 1 pgdba pg 262144 Aug 15 23:08 0004
-rw------- 1 pgdba pg 262144 Aug 16 05:33 0005
-rw------- 1 pgdba pg 262144 Aug 16 11:42 0006
-rw------- 1 pgdba pg 262144 Aug 16 18:25 0007
-rw------- 1 pgdba pg 262144 Aug 16 23:57 0008
-rw------- 1 pgdba pg 262144 Aug 17 08:16 0009
-rw------- 1 pgdba pg 262144 Aug 17 14:31 000A
-rw------- 1 pgdba pg 262144 Aug 17 20:24 000B
-rw------- 1 pgdba pg 262144 Aug 17 23:57 000C
-rw------- 1 pgdba pg 262144 Aug 18 03:33 000D
-rw------- 1 pgdba pg 262144 Aug 18 13:01 000E
-rw------- 1 pgdba pg 262144 Aug 19 13:03 000F
-rw------- 1 pgdba pg 262144 Aug 19 18:54 0010
-rw------- 1 pgdba pg 262144 Aug 19 23:19 0011
-rw------- 1 pgdba pg 262144 Aug 20 04:29 0012
-rw------- 1 pgdba pg 262144 Aug 20 12:50 0013
-rw------- 1 pgdba pg 262144 Aug 20 15:00 0014
-rw------- 1 pgdba pg 262144 Aug 20 23:29 0015
-rw------- 1 pgdba pg 262144 Aug 21 11:50 0016
-rw------- 1 pgdba pg 262144 Aug 21 16:36 0017
-rw------- 1 pgdba pg 262144 Aug 21 21:36 0018
-rw------- 1 pgdba pg 262144 Aug 22 03:24 0019
-rw------- 1 pgdba pg 262144 Aug 22 09:16 001A
-rw------- 1 pgdba pg 262144 Aug 22 15:59 001B
-rw------- 1 pgdba pg 262144 Aug 23 06:39 001C
-rw------- 1 pgdba pg 262144 Aug 24 01:10 001D
-rw------- 1 pgdba pg 262144 Aug 24 15:53 001E
-rw------- 1 pgdba pg 262144 Aug 25 09:54 001F
-rw------- 1 pgdba pg 262144 Aug 25 14:37 0020
-rw------- 1 pgdba pg 262144 Aug 26 01:29 0021
-rw------- 1 pgdba pg 262144 Aug 26 13:13 0022
-rw------- 1 pgdba pg 262144 Aug 26 18:26 0023
-rw------- 1 pgdba pg 262144 Aug 27 10:14 0024
-rw------- 1 pgdba pg 262144 Aug 27 17:10 0025
-rw------- 1 pgdba pg 262144 Aug 28 08:31 0026
-rw------- 1 pgdba pg 262144 Aug 28 15:21 0027
-rw------- 1 pgdba pg 262144 Aug 29 06:11 0028
-rw------- 1 pgdba pg 262144 Aug 29 13:56 0029
-rw------- 1 pgdba pg 262144 Aug 30 03:51 002A
-rw------- 1 pgdba pg 262144 Aug 30 17:15 002B
-rw------- 1 pgdba pg 262144 Aug 31 11:31 002C
-rw------- 1 pgdba pg 262144 Sep 1 04:59 002D
-rw------- 1 pgdba pg 262144 Sep 1 17:01 002E
-rw------- 1 pgdba pg 262144 Sep 2 09:52 002F
-rw------- 1 pgdba pg 262144 Sep 2 16:24 0030
-rw------- 1 pgdba pg 262144 Sep 3 07:07 0031
-rw------- 1 pgdba pg 262144 Sep 3 13:27 0032
-rw------- 1 pgdba pg 262144 Sep 4 04:25 0033
-rw------- 1 pgdba pg 262144 Sep 4 13:11 0034
-rw------- 1 pgdba pg 262144 Sep 5 02:11 0035
-rw------- 1 pgdba pg 262144 Sep 5 12:31 0036
-rw------- 1 pgdba pg 262144 Sep 6 01:18 0037
-rw------- 1 pgdba pg 262144 Sep 6 17:12 0038
-rw------- 1 pgdba pg 262144 Sep 7 12:01 0039
-rw------- 1 pgdba pg 262144 Sep 8 08:00 003A
-rw------- 1 pgdba pg 262144 Sep 8 14:32 003B
-rw------- 1 pgdba pg 262144 Sep 9 06:14 003C
-rw------- 1 pgdba pg 262144 Sep 9 13:12 003D
-rw------- 1 pgdba pg 262144 Sep 9 20:56 003E
-rw------- 1 pgdba pg 262144 Sep 10 09:26 003F
-rw------- 1 pgdba pg 262144 Sep 10 14:27 0040
-rw------- 1 pgdba pg 262144 Sep 10 20:29 0041
-rw------- 1 pgdba pg 262144 Sep 11 03:29 0042
-rw------- 1 pgdba pg 262144 Sep 11 12:00 0043
-rw------- 1 pgdba pg 262144 Sep 11 20:27 0044
-rw------- 1 pgdba pg 262144 Sep 12 09:01 0045
-rw------- 1 pgdba pg 262144 Sep 12 15:37 0046
-rw------- 1 pgdba pg 262144 Sep 13 07:29 0047
-rw------- 1 pgdba pg 262144 Sep 13 18:59 0048
-rw------- 1 pgdba pg 262144 Sep 14 12:05 0049
-rw------- 1 pgdba pg 262144 Sep 15 07:17 004A
-rw------- 1 pgdba pg 262144 Sep 15 13:53 004B
-rw------- 1 pgdba pg 262144 Sep 16 01:09 004C
-rw------- 1 pgdba pg 262144 Sep 16 11:18 004D
-rw------- 1 pgdba pg 262144 Sep 16 18:46 004E
-rw------- 1 pgdba pg 262144 Sep 17 09:17 004F
-rw------- 1 pgdba pg 262144 Sep 17 16:45 0050
-rw------- 1 pgdba pg 262144 Sep 18 07:39 0051
-rw------- 1 pgdba pg 262144 Sep 18 14:20 0052
-rw------- 1 pgdba pg 262144 Sep 19 01:38 0053
-rw------- 1 pgdba pg 262144 Sep 19 12:05 0054
-rw------- 1 pgdba pg 262144 Sep 19 22:39 0055
-rw------- 1 pgdba pg 262144 Sep 20 13:55 0056
-rw------- 1 pgdba pg 262144 Sep 21 09:02 0057
-rw------- 1 pgdba pg 262144 Sep 22 02:47 0058
-rw------- 1 pgdba pg 262144 Sep 22 12:42 0059
-rw------- 1 pgdba pg 262144 Sep 22 21:57 005A
-rw------- 1 pgdba pg 262144 Sep 23 10:28 005B
-rw------- 1 pgdba pg 262144 Sep 23 18:00 005C
-rw------- 1 pgdba pg 262144 Sep 24 08:52 005D
-rw------- 1 pgdba pg 262144 Sep 24 15:14 005E
-rw------- 1 pgdba pg 262144 Sep 2504:16 005F
-rw------- 1 pgdba pg 262144 Sep 25 12:17 0060
-rw------- 1 pgdba pg 262144 Sep 25 20:17 0061
-rw------- 1 pgdba pg 262144 Sep 26 10:07 0062
-rw------- 1 pgdba pg 262144 Sep 26 16:24 0063
-rw------- 1 pgdba pg 262144 Sep 27 09:20 0064
-rw------- 1 pgdba pg 262144 Sep 28 00:27 0065
-rw------- 1 pgdba pg 262144 Sep 28 16:17 0066
-rw------- 1 pgdba pg 262144 Sep 29 09:45 0067
-rw------- 1 pgdba pg 262144 Sep 29 16:37 0068
-rw------- 1 pgdba pg 262144 Sep 30 07:44 0069
-rw------- 1 pgdba pg 262144 Sep 30 15:03 006A
-rw------- 1 pgdba pg 262144 Oct 1 05:59 006B
-rw------- 1 pgdba pg 262144 Oct 1 12:52 006C
-rw------- 1 pgdba pg 262144 Oct 1 22:19 006D
-rw------- 1 pgdba pg 262144 Oct 2 10:53 006E
-rw------- 1 pgdba pg 262144 Oct 2 19:28 006F
-rw------- 1 pgdba pg 262144 Oct 3 10:18 0070
-rw------- 1 pgdba pg 262144 Oct 3 19:11 0071
-rw------- 1 pgdba pg 262144 Oct 4 12:42 0072
-rw------- 1 pgdba pg 262144 Oct 5 08:24 0073
-rw------- 1 pgdba pg 262144 Oct 6 00:03 0074
-rw------- 1 pgdba pg 262144 Oct 6 11:57 0075
-rw------- 1 pgdba pg 262144 Oct 6 19:46 0076
-rw------- 1 pgdba pg 262144 Oct 7 09:43 0077
-rw------- 1 pgdba pg 262144 Oct 7 17:09 0078
-rw------- 1 pgdba pg 262144 Oct 8 07:33 0079
-rw------- 1 pgdba pg 262144 Oct 8 13:34 007A
-rw------- 1 pgdba pg 262144 Oct 8 18:41 007B
-rw------- 1 pgdba pg 262144 Oct 8 23:28 007C
-rw------- 1 pgdba pg 262144 Oct 9 09:51 007D
-rw------- 1 pgdba pg 262144 Oct 9 14:22 007E
-rw------- 1 pgdba pg 262144 Oct 9 17:04 007F
-rw------- 1 pgdba pg 262144 Oct 10 06:56 0080
-rw------- 1 pgdba pg 262144 Oct 10 12:31 0081
-rw------- 1 pgdba pg 262144 Oct 10 18:19 0082
-rw------- 1 pgdba pg 262144 Oct 11 10:22 0083
-rw------- 1 pgdba pg 262144 Oct 12 02:29 0084
-rw------- 1 pgdba pg 262144 Oct 12 17:43 0085
-rw------- 1 pgdba pg 262144 Oct 13 09:49 0086
-rw------- 1 pgdba pg 262144 Oct 13 17:00 0087
-rw------- 1 pgdba pg 262144 Oct 14 07:48 0088
-rw------- 1 pgdba pg 262144 Oct 14 12:49 0089
-rw------- 1 pgdba pg 262144 Oct 14 16:48 008A
-rw------- 1 pgdba pg 262144 Oct 15 07:33 008B
-rw------- 1 pgdba pg 262144 Oct 15 14:30 008C
-rw------- 1 pgdba pg 262144 Oct 16 01:41 008D
-rw------- 1 pgdba pg 262144 Oct 16 12:30 008E
-rw------- 1 pgdba pg 262144 Oct 16 20:30 008F
-rw------- 1 pgdba pg 262144 Oct 17 10:32 0090
-rw------- 1 pgdba pg 262144 Oct 17 17:38 0091
-rw------- 1 pgdba pg 262144 Oct 18 10:25 0092
-rw------- 1 pgdba pg 262144 Oct 19 01:53 0093
-rw------- 1 pgdba pg 262144 Oct 19 16:38 0094
-rw------- 1 pgdba pg 262144 Oct 20 09:23 0095
-rw------- 1 pgdba pg 262144 Oct 20 16:40 0096
-rw------- 1 pgdba pg 262144 Oct 21 07:08 0097
-rw------- 1 pgdba pg 262144 Oct 21 13:31 0098
-rw------- 1 pgdba pg 262144 Oct 21 21:56 0099
-rw------- 1 pgdba pg 262144 Oct 22 10:02 009A
-rw------- 1 pgdba pg 262144 Oct 22 16:31 009B
-rw------- 1 pgdba pg 262144 Oct 22 22:59 009C
-rw------- 1 pgdba pg 262144 Oct 23 10:46 009D
-rw------- 1 pgdba pg 262144 Oct 23 17:20 009E
-rw------- 1 pgdba pg 262144 Oct 24 08:25 009F
-rw------- 1 pgdba pg 262144 Oct 24 14:48 00A0
-rw------- 1 pgdba pg 262144 Oct 25 05:45 00A1
-rw------- 1 pgdba pg 262144 Oct 25 20:22 00A2
-rw------- 1 pgdba pg 262144 Oct 26 13:16 00A3
-rw------- 1 pgdba pg 262144 Oct 27 07:34 00A4
-rw------- 1 pgdba pg 262144 Oct 27 13:54 00A5
-rw------- 1 pgdba pg 262144 Oct 28 03:14 00A6
-rw------- 1 pgdba pg 262144 Oct 28 11:58 00A7
-rw------- 1 pgdba pg 262144 Oct 28 19:36 00A8
-rw------- 1 pgdba pg 262144 Oct 29 09:39 00A9
-rw------- 1 pgdba pg 262144 Oct 29 16:27 00AA
-rw------- 1 pgdba pg 262144 Oct 30 07:23 00AB
-rw------- 1 pgdba pg 262144 Oct 30 13:43 00AC
-rw------- 1 pgdba pg 262144 Oct 31 02:31 00AD
-rw------- 1 pgdba pg 262144 Oct 31 11:59 00AE
-rw------- 1 pgdba pg 262144 Oct 31 19:54 00AF
-rw------- 1 pgdba pg 262144 Nov 1 13:44 00B0
-rw------- 1 pgdba pg 262144 Nov 2 08:26 00B1
-rw------- 1 pgdba pg 262144 Nov 2 20:59 00B2
-rw------- 1 pgdba pg 262144 Nov 3 10:33 00B3
-rw------- 1 pgdba pg 262144 Nov 3 17:21 00B4
-rw------- 1 pgdba pg 262144 Nov 4 09:01 00B5
-rw------- 1 pgdba pg 262144 Nov 4 14:44 00B6
-rw------- 1 pgdba pg 262144 Nov 5 06:33 00B7
-rw------- 1 pgdba pg 262144 Nov 5 13:17 00B8
-rw------- 1 pgdba pg 262144 Nov 5 20:45 00B9
-rw------- 1 pgdba pg 262144 Nov 6 09:45 00BA
-rw------- 1 pgdba pg 262144 Nov 6 17:04 00BB
-rw------- 1 pgdba pg 262144 Nov 7 06:55 00BC
-rw------- 1 pgdba pg 262144 Nov 7 13:31 00BD
-rw------- 1 pgdba pg 262144 Nov 8 03:58 00BE
-rw------- 1 pgdba pg 262144 Nov 8 17:04 00BF
-rw------- 1 pgdba pg 262144 Nov 9 11:14 00C0
-rw------- 1 pgdba pg 262144 Nov 10 06:16 00C1
-rw------- 1 pgdba pg 262144 Nov 10 12:47 00C2
-rw------- 1 pgdba pg 262144 Nov 10 21:18 00C3
-rw------- 1 pgdba pg 262144 Nov 11 10:34 00C4
-rw------- 1 pgdba pg 262144 Nov 11 17:23 00C5
-rw------- 1 pgdba pg 262144 Nov 12 09:15 00C6
-rw------- 1 pgdba pg 262144 Nov 12 15:03 00C7
-rw------- 1 pgdba pg 262144 Nov 13 06:30 00C8
-rw------- 1 pgdba pg 262144 Nov 13 13:56 00C9
-rw------- 1 pgdba pg 262144 Nov 14 00:38 00CA
-rw------- 1 pgdba pg 262144 Nov 14 13:06 00CB
-rw------- 1 pgdba pg 262144 Nov 14 21:27 00CC
-rw------- 1 pgdba pg 262144 Nov 15 13:25 00CD
-rw------- 1 pgdba pg 262144 Nov 16 08:57 00CE
-rw------- 1 pgdba pg 262144 Nov 16 23:22 00CF
-rw------- 1 pgdba pg 262144 Nov 17 11:49 00D0
-rw------- 1 pgdba pg 262144 Nov 17 20:12 00D1
-rw------- 1 pgdba pg 262144 Nov 18 09:10 00D2
-rw------- 1 pgdba pg 262144 Nov 18 16:02 00D3
-rw------- 1 pgdba pg 262144 Nov 19 05:23 00D4
-rw------- 1 pgdba pg 262144 Nov 19 12:27 00D5
-rw------- 1 pgdba pg 262144 Nov 19 19:22 00D6
-rw------- 1 pgdba pg 262144 Nov 20 10:36 00D7
-rw------- 1 pgdba pg 262144 Nov 20 16:40 00D8
-rw------- 1 pgdba pg 262144 Nov 21 08:19 00D9
-rw------- 1 pgdba pg 262144 Nov 21 14:53 00DA
-rw------- 1 pgdba pg 262144 Nov 22 05:41 00DB
-rw------- 1 pgdba pg 262144 Nov 22 19:28 00DC
-rw------- 1 pgdba pg 262144 Nov 23 12:30 00DD
-rw------- 1 pgdba pg 262144 Nov 24 07:24 00DE
-rw------- 1 pgdba pg 262144 Nov 24 14:18 00DF
-rw------- 1 pgdba pg 262144 Nov 25 02:03 00E0
-rw------- 1 pgdba pg 262144 Nov 25 11:47 00E1
-rw------- 1 pgdba pg 262144 Nov 25 18:46 00E2
-rw------- 1 pgdba pg 262144 Nov 26 09:57 00E3
-rw------- 1 pgdba pg 262144 Nov 26 17:09 00E4
-rw------- 1 pgdba pg 262144 Nov 27 11:48 00E5
-rw------- 1 pgdba pg 262144 Nov 28 07:43 00E6
-rw------- 1 pgdba pg 262144 Nov 28 16:12 00E7
-rw------- 1 pgdba pg 262144 Nov 29 09:02 00E8
-rw------- 1 pgdba pg 262144 Nov 30 01:06 00E9
-rw------- 1 pgdba pg 262144 Nov 30 16:51 00EA
-rw------- 1 pgdba pg 262144 Dec 1 09:23 00EB
-rw------- 1 pgdba pg 262144 Dec 1 17:05 00EC
-rw------- 1 pgdba pg 262144 Dec 2 07:24 00ED
-rw------- 1 pgdba pg 262144 Dec 2 14:19 00EE
-rw------- 1 pgdba pg 262144 Dec 3 03:52 00EF
-rw------- 1 pgdba pg 262144 Dec 3 12:51 00F0
-rw------- 1 pgdba pg 262144 Dec 3 22:34 00F1
-rw------- 1 pgdba pg 262144 Dec 4 10:46 00F2
-rw------- 1 pgdba pg 262144 Dec 4 17:20 00F3
-rw------- 1 pgdba pg 262144 Dec 5 11:34 00F4
-rw------- 1 pgdba pg 262144 Dec 6 00:23 00F5
-rw------- 1 pgdba pg 262144 Dec 6 11:07 00F6
-rw------- 1 pgdba pg 114688 Dec 6 16:10 00F7
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

--
Martijn van Oosterhout <kl*****@svana. org> http://svana.org/kleptog/ "All that is needed for the forces of evil to triumph is for enough good
men to do nothing." - Edmond Burke
"The penalty good people pay for not being interested in politics is to be
governed by people worse than themselves." - Plato

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE/0mC2Y5Twig3Ge+Y RAvBuAJ9k3Ap7be YUN2wcbhGrTXe/AKTZ+gCgxrcq
ciZ+e+VoBvFLHLH beoO2M28=
=5+9M
-----END PGP SIGNATURE-----

Nov 12 '05 #3

Tom Lane

"Ed L." <pg***@bluepolk a.net> writes:

Here's the pg_dump output, edited to=20
protect the guilty: pg_dump: PANIC: open of .../data/pg_clog/04E5 failed: No such file or=20
directory

Given that this is far away from the range of valid clog segment names,
it seems safe to say that it's a symptom of a corrupted tuple header
(specifically, a whacked-out transaction ID number in some tuple
header).

You could probably track down the bad row (if there's only one or a few)
by expedients like seeing how far "SELECT ... FROM sometable LIMIT n"
will go without crashing. Once you have identified where the bad row is
located, you could try to repair it, or just zero out the whole page if
you're willing to lose the other rows on the same page. I would be
interested to see a pg_filedump dump of the corrupted page, if you go as
far as finding it.

(There are previous discussions of coping with corrupted data in the
mailing list archives. Searching for references to pg_filedump should
turn up some useful threads.)

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 12 '05 #4

Ed L.

On Monday December 8 2003 6:55, Ed L. wrote:

On Saturday December 6 2003 4:43, Tom Lane wrote:
"Ed L." <pg***@bluepolk a.net> writes:
Here's the pg_dump output, edited to=20
protect the guilty:

pg_dump: PANIC: open of .../data/pg_clog/04E5 failed: No such file
or=20 directory

Given that this is far away from the range of valid clog segment names,
it seems safe to say that it's a symptom of a corrupted tuple header
(specifically, a whacked-out transaction ID number in some tuple
header).

I moved PGDATA to a new system due to catastrophic hardware failures
(media and data errors on RAID5 + operator error when a tech pulled a
hotswap disk without failing the drive first). Now I am finally getting
a good look at the corruption (which appears to have moved around during
the scp):

$ psql -c "\d misc"
ERROR: _mdfd_getrelnfd : cannot open relation pg_depend_depen der_index:
No such file or directory

And note this from .../data/base/28607376:

$ oid2name -d mydb -t pg_depend_depen der_index
Oid of table pg_depend_depen der_index from database "mydb":
---------------------------------
16622 = pg_depend_depen der_index
$ ls -l 16622
ls: 16622: No such file or directory

Any clues as to first steps at recovery? Recovering from backup is
unfortunately not a very viable option.

Ed
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postg resql.org so that your
message can get through to the mailing list cleanly

Nov 12 '05 #5

Tom Lane

"Ed L." <pg***@bluepolk a.net> writes:

Now I am finally getting a good look at
the corruption (which appears to have moved around during the scp):

Hm. I don't see anything particularly exceptionable in pg_class page 11
--- rather a lot of dead tuples, but that's not proof of corruption.
To judge by your SELECT results, there are *no* live tuples in pg_class
between pages 11 and 543, and a bad page header in page 543. What do
you see if you ask pg_filedump to dump all that page range? (It'd be
a bit much to send to the list, but you can send it to me off-list.)

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 12 '05 #6

Tom Lane

Ed Loehr <ed@LoehrTech.c om> writes:

This is pg_class; look at the ascii names on the right. I notice that one
name (misc_doctors) is repeated twice.

Sure, but one of those rows is committed dead. Looks like a perfectly
ordinary case of a not-yet-vacuumed update to me.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #7

Ed Loehr

On Monday December 8 2003 8:23, you wrote:

"Ed L." <pg***@bluepolk a.net> writes:
Now I am finally getting a good look at
the corruption (which appears to have moved around during the scp):

Hm. I don't see anything particularly exceptionable in pg_class page 11
--- rather a lot of dead tuples, but that's not proof of corruption.
To judge by your SELECT results, there are *no* live tuples in pg_class
between pages 11 and 543, and a bad page header in page 543. What do
you see if you ask pg_filedump to dump all that page range? (It'd be
a bit much to send to the list, but you can send it to me off-list.)

This is pg_class; look at the ascii names on the right. I notice that one
name (misc_doctors) is repeated twice. We also have an error dumping that
table in which the end of the dump gets two BLANK tuples in the output,
causing the load to fail due to missing columns. Is it possible that we
have two pg_class tuples with same relname, and if so, is that a
corruption?

Will send full dump...

TIA
Ed
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 12 '05 #8

Similar topics

1708

Rollforward recovery...another question...

by: Raquel | last post by:

During a ROLLFORWARD recovery, are the log datasets copied from the archive log directory to the active log directory? I know the log datasets are copied from archive log directory to active log directory during a ROLLBACK (for example in a crash recovery), but was wondering if it is the same during a ROLLFORWARD recovery too. TIA Raquel.

DB2 Database

4431

DB2 UDB recovery

by: Neil Truby | last post by:

There's something clearly missing in my understanding of recovery: I set up a small sample datavase and deleted all the rows from a table. Crucially, I omitted the "commit". I then shut down db2, and copied everything under the db2inst1/db2inst1/NODE0000 directory to another server. When I restarted db2 on the first server the rows were still missing. But on the second server they were still there :-( I repeated the exercise several...

DB2 Database

9544

crash recovery

by: xixi | last post by:

i have db2 udb v8.1 on windows 64 bit 2003 server, after db2 server start , i found this in the db2diag.log, is this error? 2004-05-05-15.28.30.780000 Instance:DB2 Node:000 PID:1692(db2syscs.exe) TID:2860 Appid:AC10040A.GD5F.00FC56D8BEC5 base sys utilities sqledint Probe:30 Crash Recovery is needed. 2004-05-05-15.28.31.890000 Instance:DB2 Node:000

DB2 Database

2635

container recovery

by: jignesh shah | last post by:

Hi all, Is there a way to recover a single container if its been corrupted or mark bad without restoring whole tablespace? environment: db28.1/aix5.1/tsm/rs-6000. Regards Jignesh

DB2 Database

3886

HELP NEEDED for Internal error 0xFFFFF707 which cause DB to go Crash Recovery

by: NG | last post by:

Hi, We are having DB2-V7.2 DB on AIX 5.2 machine. Recently we upgraded our system to fixpack 13 and activated activate AIX asynchronous IO function. Our DB is going to crash recovery with this error 0xFFFFF707. I have attached the related details. Any suggestions is much appreciated.

DB2 Database

4545

Advice needed: Should we upgrade MS Access 2000? And if so to what?

by: ship | last post by:

Hi We need some advice: We are thinking of upgrading our Access database from Access 2000 to Access 2004. How stable is MS Office 2003? (particularly Access 2003). We are just a small company and this is a big decision for us(!) It's not just the money it's committing to an new version of Access!

Microsoft Access / VBA

2031

virtual fields on VIEW?

by: raptor | last post by:

hi, I want to make the following thing : select-based updatable VIEW, which have two more virtual-fields. One of them is concatenation of others and the second is calculated on the fly. Can I do this and if yes how? can u give some example? Here is the test bed : table1) id, date, field1, field2

PostgreSQL Database

2328

Bizarre Crash Recovery

by: Michel Esber | last post by:

Hello, Linux V8 FP12 with a 64 bit instance. My DBA called me and said our instance crashed and recovery would never end. $ db2 list utilities show detail ID = 1

DB2 Database

3648

Crash recovery

by: Racerx | last post by:

Hi All : I use db2 8.1 fixpack 3 on AIX. I recieved the following message in the diaglog ====================================================== ADM7513W Database manager has started. 2007-01-13-18.55.08.262174 Instance:db2inst1 Node:000 PID:467078(db2agent (mumar) 0) TID:1 Appid:GA010302.O03F.01101B9A3444 base sys utilities sqledint Probe:30

DB2 Database

9647

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

10360

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

10108

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

8988

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...

Career Advice

7510

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

6744

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

5397

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

5532

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

2894

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

General