473,804 Members | 3,201 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Corrupted Data?

Hello, pg_dump started failing for one of my databases, so I looked in
to it and it appears that I have some corrupted data or something. I
assume this is related to a failed hard disk that was part of the linux
software raid mirror.

I backed up the entire data directory, and did a pg_resetxlog, but that
didn't help. I found the specific row that seems to be the problem, but
I can't delete it.

Anyway, I don't know how to fix this, so if you could please help, I
would appreciate it.

Details are as follows:

[dbmail2@dezeut dbmail2]$ psql
Welcome to psql 7.4.2, the PostgreSQL interactive terminal.
dbmail2=# SELECT version();
version
---------------------------------------------------------------------------------------------------------
PostgreSQL 7.4.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2.2
20030222 (Red Hat Linux 3.2.2-5)

dbmail2=# SELECT oid,messageblk_ idnr, physmessage_id, blocksize from
messageblks where messageblk_idnr =7718;
oid | messageblk_idnr | physmessage_id | blocksize
---------+-----------------+----------------+-----------
2916427 | 7718 | 3842 | 524288
(1 row)

dbmail2=# SELECT oid,messageblk_ idnr, physmessage_id, blocksize,
messageblk from messageblks where messageblk_idnr =7718;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!> \q
[dbmail2@dezeut dbmail2]$ psql
Welcome to psql 7.4.2, the PostgreSQL interactive terminal.
dbmail2=# delete from messageblks where oid = 2916427;
ERROR: could not access status of transaction 3822646358
DETAIL: could not open file "/var/lib/pgsql/data/pg_clog/0E3D": No such
file or directory


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 23 '05 #1
9 5013
On Mon, 13 Sep 2004, Matthew T. O'Connor wrote:
I backed up the entire data directory, and did a pg_resetxlog, but that
didn't help. I found the specific row that seems to be the problem, but
I can't delete it.


I have used TRUNCATE on the table in this situation to recover. Another
option might be to DROP the table. Or perhaps restore from backups.

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postg resql.org

Nov 23 '05 #2
On Tue, 2004-09-14 at 00:46, Chester Kustarz wrote:
On Mon, 13 Sep 2004, Matthew T. O'Connor wrote:
I backed up the entire data directory, and did a pg_resetxlog, but that
didn't help. I found the specific row that seems to be the problem, but
I can't delete it.


I have used TRUNCATE on the table in this situation to recover. Another
option might be to DROP the table. Or perhaps restore from backups.


I would really prefer not to do that as pg_dump has apparently been
failing for a while so I would lose a fair amount of data.
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 23 '05 #3
On Tue, Sep 14, 2004 at 08:13:24AM -0400, Matthew T. O'Connor wrote:
On Tue, 2004-09-14 at 00:46, Chester Kustarz wrote:
On Mon, 13 Sep 2004, Matthew T. O'Connor wrote:
I backed up the entire data directory, and did a pg_resetxlog, but that
didn't help. I found the specific row that seems to be the problem, but
I can't delete it.


I have used TRUNCATE on the table in this situation to recover. Another
option might be to DROP the table. Or perhaps restore from backups.


I would really prefer not to do that as pg_dump has apparently been
failing for a while so I would lose a fair amount of data.


You can create a pg_clog file (the one it's complaining about) filled
with zeros, using
dd if=/dev/zero bs=1k count=8 of=/path/to/data/pg_clog/0E3D

and then you should be able to pg_dump the table (or at least find out
if there is another corrupted tuple.) Beware that the corrupted tuple
may be in there if it's supposed not to be, or it may not be if it's
supposed to be.

After you get your data back, I'd suggest running the usual hardware
checking tools, and restore from the backup.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Use it up, wear it out, make it do, or do without"
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 23 '05 #4
Alvaro Herrera wrote:
On Tue, Sep 14, 2004 at 08:13:24AM -0400, Matthew T. O'Connor wrote:
You can create a pg_clog file (the one it's complaining about) filled
with zeros, using
dd if=/dev/zero bs=1k count=8 of=/path/to/data/pg_clog/0E3D


Ok, I tried this, and it changed the error but hasn't fixed the problem
now I get this:

[dbmail2@dezeut dbmail2]$ psql
Welcome to psql 7.4.2, the PostgreSQL interactive terminal.
dbmail2=# delete from messageblks where messageblk_idnr = 7718;
ERROR: could not access status of transaction 3822646358
DETAIL: could not read from file "/var/lib/pgsql/data/pg_clog/0E3D" at
offset 139264: Success

And in the log file I get this:

ERROR: XX000: could not access status of transaction 3822646358
DETAIL: could not read from file "/var/lib/pgsql/data/pg_clog/0E3D" at
offset 139264: Success
LOCATION: SlruReportIOErr or, slru.c:634

Any more thoughts?

Thanks again,

Matthew

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to ma*******@postg resql.org)

Nov 23 '05 #5
On Tue, Sep 14, 2004 at 08:01:13PM -0400, Matthew T. O'Connor wrote:
Alvaro Herrera wrote:
On Tue, Sep 14, 2004 at 08:13:24AM -0400, Matthew T. O'Connor wrote:
You can create a pg_clog file (the one it's complaining about) filled
with zeros, using
dd if=/dev/zero bs=1k count=8 of=/path/to/data/pg_clog/0E3D


Ok, I tried this, and it changed the error but hasn't fixed the problem
now I get this:

[dbmail2@dezeut dbmail2]$ psql
Welcome to psql 7.4.2, the PostgreSQL interactive terminal.
dbmail2=# delete from messageblks where messageblk_idnr = 7718;
ERROR: could not access status of transaction 3822646358
DETAIL: could not read from file "/var/lib/pgsql/data/pg_clog/0E3D" at
offset 139264: Success


Huh, sorry, the directions only created the first block of the file, but
you needed the 17th ...

dd if=/dev/zero bs=8k count=17 of=/path/to/data/pg_clog/0E3D

I may be subject of a fencepost problem here, so if it doesn't work try
with 18.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Limítate a mirar... y algun día veras"
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to ma*******@postg resql.org)

Nov 23 '05 #6
Alvaro Herrera wrote:
Huh, sorry, the directions only created the first block of the file, but
you needed the 17th ...

dd if=/dev/zero bs=8k count=17 of=/path/to/data/pg_clog/0E3D

I may be subject of a fencepost problem here, so if it doesn't work try
with 18.


I don't know if we are making progress but I am getting a different
error now :-)

I did the dd command again this time with count=18. Now when I try to
delete the tuple I get this:

dbmail2=# delete from messageblks where oid = 2916427;
ERROR: attempted to delete invisible tuple

The Postmaster log has this to say:

ERROR: XX000: attempted to delete invisible tuple
LOCATION: heap_delete, heapam.c:1258
Thanks again for the help.

Matthew

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 23 '05 #7
On Tue, Sep 14, 2004 at 10:01:21PM -0400, Matthew T. O'Connor wrote:
Alvaro Herrera wrote:
Huh, sorry, the directions only created the first block of the file, but
you needed the 17th ...

dd if=/dev/zero bs=8k count=17 of=/path/to/data/pg_clog/0E3D

I may be subject of a fencepost problem here, so if it doesn't work try
with 18.


I don't know if we are making progress but I am getting a different
error now :-)

I did the dd command again this time with count=18. Now when I try to
delete the tuple I get this:

dbmail2=# delete from messageblks where oid = 2916427;
ERROR: attempted to delete invisible tuple


I think I know what is going on, but I'm not sure how to solve the
problem. If I were in your situation I'd edit the data file and stash
FrozenTransacti onId in the Xmin and InvalidTransact ionId in Xmax for
that tuple. Short of using an hex editor, I'm not sure how to do that,
however, and before doing anything that foolish I'd backup the file two
or three times just to be sure.

You may try using pgfsck (http://svana.org/kleptog/pgsql/pgfsck.html) or
pg_filedump (http://sources.redhat.com/rhdb) and see how lucky you get
with the hex editor ...
(memories of cheating in VGA Planets by use of said hex editor many
years ago now come to my mind ...)

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
Major Fambrough: You wish to see the frontier?
John Dunbar: Yes sir, before it's gone.
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 23 '05 #8
Alvaro Herrera wrote:
On Tue, Sep 14, 2004 at 10:01:21PM -0400, Matthew T. O'Connor wrote:
I don't know if we are making progress but I am getting a different
error now :-)

I did the dd command again this time with count=18. Now when I try to
delete the tuple I get this:

dbmail2=# delete from messageblks where oid = 2916427;
ERROR: attempted to delete invisible tuple

Well after using dd to create a few missing pg_clog files, I was finally
able to do a vacuum of the whole database which allowed me to delete the
problematic tuple which allowed me to do a pg_dump of the database! So
finally some progress.

However, I then ran into a new problem while trying to dump another
database. Now I get this:

dbmail=# SELECT * from messageblks ;
ERROR: invalid page header in block 85646 of relation "pg_toast_23533 40"

Any ideas on this new issue?
You may try using pgfsck (http://svana.org/kleptog/pgsql/pgfsck.html) or
pg_filedump (http://sources.redhat.com/rhdb) and see how lucky you get
with the hex editor ...


I looked at pgfsck and it seems that pgfsck was last updated for 7.3.
I'll take a look at pg_filedump.

Thanks again,

Matthew

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postg resql.org

Nov 23 '05 #9
On Wed, Sep 15, 2004 at 02:37:13PM -0400, Matthew T. O'Connor wrote:
However, I then ran into a new problem while trying to dump another
database. Now I get this:

dbmail=# SELECT * from messageblks ;
ERROR: invalid page header in block 85646 of relation "pg_toast_23533 40"

Any ideas on this new issue?


IMO this is FUBAR ... try enabling zero_damaged_pa ges. Beware that data
on damaged pages will be lost.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"La libertad es como el dinero; el que no la sabe emplear la pierde" (Alvarez)
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postg resql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
2336
by: Stuart McGraw | last post by:
I just spent a $*#@!*&^&% hour registering at ^$#@#%^ Sourceforce and trying to submit a Python bug report but it still won't let me. I give up. Maybe someone who cares will see this post, or maybe it will save time for someone else who runs into this problem... ================================================ Environment: - Microsoft Windows 2000 Pro
4
1264
by: Bill | last post by:
I have a DataSet that is serialized to a local disk. One of the testers recently experienced a problem where data was saved and the Tablet PC she was testing later went offline. When she restarted the system, the XML DataSet appears to have been corrupted and the application is generating an error "This is an unexpected token. The expected token is 'EndElement', line 26680, position 24." She has not returned the Tablet, so I haven't had...
6
2693
by: ruben | last post by:
Hi: I'm running a large database on PostgreSQL 7.1.3. 20 days ago the database failed with a threatening and not too descriptive error like: pg_exec() query failed: server closed the connection unexpectedlyThis probably means the server terminated abnormally before or while processing the request. I lost some data and had to recreate a table, without knowing the reason
2
1253
by: daD | last post by:
I administer a multi-user MS Access database over a network. Frequently, I get corrupted data that crashes the database. The most common is "#####" in all the columns for a record. The only utility I have is MS Jetcomp and sometimes it does not do the job. Does anyone else have any other utilities for repairing and recovering databases? Thanks!!
18
2761
by: Joel Hedlund | last post by:
Hi! The question of type checking/enforcing has bothered me for a while, and since this newsgroup has a wealth of competence subscribed to it, I figured this would be a great way of learning from the experts. I feel there's a tradeoff between clear, easily readdable and extensible code on one side, and safe code providing early errors and useful tracebacks on the other. I want both! How do you guys do it? What's the pythonic way? Are...
1
1442
by: ilucks | last post by:
I have DB(Access 97) where multi- user work in a network. Today, I encountered this problem. Two users were in it and do edition and adding a new data on the same form at the same time. User 1 was adding data on the form (just say form A) and user 2 was adding data as well on the same form(A) at the same time. The problem is that user 2 could see data entered by user 1 on that form(A) at that moment. As I know, access handles...
6
7008
by: Pat B | last post by:
Hi, I'm writing my own implementation of the Gnutella P2P protocol using C#. I have implemented it using BeginReceive and EndReceive calls so as not to block when waiting for data from the supernode. Everything I have written works fine sending and receiving uncompressed data. But now I want to implement compression using the deflate algorithm as the Gnutella protocol accepts: Accept-Encoding: deflate Content-Encoding: deflate in the...
5
2950
by: Neophyte317 | last post by:
In reviewing the articles, I know I shouldn't use the Memo field type. I use Access XP. I have to go with the memo type because the database is used to create court reports for juveniles and some of the entries are lengthy. As a result my users are experiencing deleted data/corrupted data entry in the memo fields. How can I minimize the problems of data loss and corruption? I've stopped using the tab form, I've reduced the amount of...
16
4243
by: Wayne | last post by:
I have an Access 2003 data file that has now corrupted twice in a week. The database is extremely simple with one main data table and a few lookup tables. The lookup tables are linked to the main table via relationships. Each user has their own copy of the frontend which links back to the data file on the server. The corrupted data file repairs OK but on both occasions 2 relationships have disappeared in the Relationships Window. When...
0
9705
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9576
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10568
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10311
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10074
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9138
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5516
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5647
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4292
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.