468,101 Members | 1,487 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,101 developers. It's quick & easy.

Corrupted Data?

Hello, pg_dump started failing for one of my databases, so I looked in
to it and it appears that I have some corrupted data or something. I
assume this is related to a failed hard disk that was part of the linux
software raid mirror.

I backed up the entire data directory, and did a pg_resetxlog, but that
didn't help. I found the specific row that seems to be the problem, but
I can't delete it.

Anyway, I don't know how to fix this, so if you could please help, I
would appreciate it.

Details are as follows:

[dbmail2@dezeut dbmail2]$ psql
Welcome to psql 7.4.2, the PostgreSQL interactive terminal.
dbmail2=# SELECT version();
version
---------------------------------------------------------------------------------------------------------
PostgreSQL 7.4.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2.2
20030222 (Red Hat Linux 3.2.2-5)

dbmail2=# SELECT oid,messageblk_idnr, physmessage_id, blocksize from
messageblks where messageblk_idnr =7718;
oid | messageblk_idnr | physmessage_id | blocksize
---------+-----------------+----------------+-----------
2916427 | 7718 | 3842 | 524288
(1 row)

dbmail2=# SELECT oid,messageblk_idnr, physmessage_id, blocksize,
messageblk from messageblks where messageblk_idnr =7718;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!> \q
[dbmail2@dezeut dbmail2]$ psql
Welcome to psql 7.4.2, the PostgreSQL interactive terminal.
dbmail2=# delete from messageblks where oid = 2916427;
ERROR: could not access status of transaction 3822646358
DETAIL: could not open file "/var/lib/pgsql/data/pg_clog/0E3D": No such
file or directory


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 23 '05 #1
9 4694
On Mon, 13 Sep 2004, Matthew T. O'Connor wrote:
I backed up the entire data directory, and did a pg_resetxlog, but that
didn't help. I found the specific row that seems to be the problem, but
I can't delete it.


I have used TRUNCATE on the table in this situation to recover. Another
option might be to DROP the table. Or perhaps restore from backups.

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 23 '05 #2
On Tue, 2004-09-14 at 00:46, Chester Kustarz wrote:
On Mon, 13 Sep 2004, Matthew T. O'Connor wrote:
I backed up the entire data directory, and did a pg_resetxlog, but that
didn't help. I found the specific row that seems to be the problem, but
I can't delete it.


I have used TRUNCATE on the table in this situation to recover. Another
option might be to DROP the table. Or perhaps restore from backups.


I would really prefer not to do that as pg_dump has apparently been
failing for a while so I would lose a fair amount of data.
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 23 '05 #3
On Tue, Sep 14, 2004 at 08:13:24AM -0400, Matthew T. O'Connor wrote:
On Tue, 2004-09-14 at 00:46, Chester Kustarz wrote:
On Mon, 13 Sep 2004, Matthew T. O'Connor wrote:
I backed up the entire data directory, and did a pg_resetxlog, but that
didn't help. I found the specific row that seems to be the problem, but
I can't delete it.


I have used TRUNCATE on the table in this situation to recover. Another
option might be to DROP the table. Or perhaps restore from backups.


I would really prefer not to do that as pg_dump has apparently been
failing for a while so I would lose a fair amount of data.


You can create a pg_clog file (the one it's complaining about) filled
with zeros, using
dd if=/dev/zero bs=1k count=8 of=/path/to/data/pg_clog/0E3D

and then you should be able to pg_dump the table (or at least find out
if there is another corrupted tuple.) Beware that the corrupted tuple
may be in there if it's supposed not to be, or it may not be if it's
supposed to be.

After you get your data back, I'd suggest running the usual hardware
checking tools, and restore from the backup.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Use it up, wear it out, make it do, or do without"
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 23 '05 #4
Alvaro Herrera wrote:
On Tue, Sep 14, 2004 at 08:13:24AM -0400, Matthew T. O'Connor wrote:
You can create a pg_clog file (the one it's complaining about) filled
with zeros, using
dd if=/dev/zero bs=1k count=8 of=/path/to/data/pg_clog/0E3D


Ok, I tried this, and it changed the error but hasn't fixed the problem
now I get this:

[dbmail2@dezeut dbmail2]$ psql
Welcome to psql 7.4.2, the PostgreSQL interactive terminal.
dbmail2=# delete from messageblks where messageblk_idnr = 7718;
ERROR: could not access status of transaction 3822646358
DETAIL: could not read from file "/var/lib/pgsql/data/pg_clog/0E3D" at
offset 139264: Success

And in the log file I get this:

ERROR: XX000: could not access status of transaction 3822646358
DETAIL: could not read from file "/var/lib/pgsql/data/pg_clog/0E3D" at
offset 139264: Success
LOCATION: SlruReportIOError, slru.c:634

Any more thoughts?

Thanks again,

Matthew

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #5
On Tue, Sep 14, 2004 at 08:01:13PM -0400, Matthew T. O'Connor wrote:
Alvaro Herrera wrote:
On Tue, Sep 14, 2004 at 08:13:24AM -0400, Matthew T. O'Connor wrote:
You can create a pg_clog file (the one it's complaining about) filled
with zeros, using
dd if=/dev/zero bs=1k count=8 of=/path/to/data/pg_clog/0E3D


Ok, I tried this, and it changed the error but hasn't fixed the problem
now I get this:

[dbmail2@dezeut dbmail2]$ psql
Welcome to psql 7.4.2, the PostgreSQL interactive terminal.
dbmail2=# delete from messageblks where messageblk_idnr = 7718;
ERROR: could not access status of transaction 3822646358
DETAIL: could not read from file "/var/lib/pgsql/data/pg_clog/0E3D" at
offset 139264: Success


Huh, sorry, the directions only created the first block of the file, but
you needed the 17th ...

dd if=/dev/zero bs=8k count=17 of=/path/to/data/pg_clog/0E3D

I may be subject of a fencepost problem here, so if it doesn't work try
with 18.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Limítate a mirar... y algun día veras"
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #6
Alvaro Herrera wrote:
Huh, sorry, the directions only created the first block of the file, but
you needed the 17th ...

dd if=/dev/zero bs=8k count=17 of=/path/to/data/pg_clog/0E3D

I may be subject of a fencepost problem here, so if it doesn't work try
with 18.


I don't know if we are making progress but I am getting a different
error now :-)

I did the dd command again this time with count=18. Now when I try to
delete the tuple I get this:

dbmail2=# delete from messageblks where oid = 2916427;
ERROR: attempted to delete invisible tuple

The Postmaster log has this to say:

ERROR: XX000: attempted to delete invisible tuple
LOCATION: heap_delete, heapam.c:1258
Thanks again for the help.

Matthew

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 23 '05 #7
On Tue, Sep 14, 2004 at 10:01:21PM -0400, Matthew T. O'Connor wrote:
Alvaro Herrera wrote:
Huh, sorry, the directions only created the first block of the file, but
you needed the 17th ...

dd if=/dev/zero bs=8k count=17 of=/path/to/data/pg_clog/0E3D

I may be subject of a fencepost problem here, so if it doesn't work try
with 18.


I don't know if we are making progress but I am getting a different
error now :-)

I did the dd command again this time with count=18. Now when I try to
delete the tuple I get this:

dbmail2=# delete from messageblks where oid = 2916427;
ERROR: attempted to delete invisible tuple


I think I know what is going on, but I'm not sure how to solve the
problem. If I were in your situation I'd edit the data file and stash
FrozenTransactionId in the Xmin and InvalidTransactionId in Xmax for
that tuple. Short of using an hex editor, I'm not sure how to do that,
however, and before doing anything that foolish I'd backup the file two
or three times just to be sure.

You may try using pgfsck (http://svana.org/kleptog/pgsql/pgfsck.html) or
pg_filedump (http://sources.redhat.com/rhdb) and see how lucky you get
with the hex editor ...
(memories of cheating in VGA Planets by use of said hex editor many
years ago now come to my mind ...)

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
Major Fambrough: You wish to see the frontier?
John Dunbar: Yes sir, before it's gone.
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 23 '05 #8
Alvaro Herrera wrote:
On Tue, Sep 14, 2004 at 10:01:21PM -0400, Matthew T. O'Connor wrote:
I don't know if we are making progress but I am getting a different
error now :-)

I did the dd command again this time with count=18. Now when I try to
delete the tuple I get this:

dbmail2=# delete from messageblks where oid = 2916427;
ERROR: attempted to delete invisible tuple

Well after using dd to create a few missing pg_clog files, I was finally
able to do a vacuum of the whole database which allowed me to delete the
problematic tuple which allowed me to do a pg_dump of the database! So
finally some progress.

However, I then ran into a new problem while trying to dump another
database. Now I get this:

dbmail=# SELECT * from messageblks ;
ERROR: invalid page header in block 85646 of relation "pg_toast_2353340"

Any ideas on this new issue?
You may try using pgfsck (http://svana.org/kleptog/pgsql/pgfsck.html) or
pg_filedump (http://sources.redhat.com/rhdb) and see how lucky you get
with the hex editor ...


I looked at pgfsck and it seems that pgfsck was last updated for 7.3.
I'll take a look at pg_filedump.

Thanks again,

Matthew

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 23 '05 #9
On Wed, Sep 15, 2004 at 02:37:13PM -0400, Matthew T. O'Connor wrote:
However, I then ran into a new problem while trying to dump another
database. Now I get this:

dbmail=# SELECT * from messageblks ;
ERROR: invalid page header in block 85646 of relation "pg_toast_2353340"

Any ideas on this new issue?


IMO this is FUBAR ... try enabling zero_damaged_pages. Beware that data
on damaged pages will be lost.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"La libertad es como el dinero; el que no la sabe emplear la pierde" (Alvarez)
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #10

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

7 posts views Thread by Stuart McGraw | last post: by
1 post views Thread by Solo | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.