473,750 Members | 2,478 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Database Recovery Procedures

Looks like for the first time in 6 years, I'm experienced some database table
corruption. This was due to the space filling up on a server (you don't want to
know how that happened).

I have 3 tables corrupt and the others are fine (which I dumped to be safe). I
have a backup which I could use but then I realized that maybe there might be
some "surgery" I could perform to get the table "repaired". Note that the
normal recovery that the database does on its own did not work in this case.

I looked through the documentation (Admin 7.3.2) and I thought there was a
disaster recovery section but there is only "recovery" discussed as part of
backup/restore. If this information is out there somewhere else if someone
could provide a link that would be a great help as well.

My question/statement is that I think this is something that is important to
have. At least in regards to different strategies one could try to surgically
recover data BEFORE use the broad sword method of going to a backup. One of the
successful "sell" points I use to my clients is how resilient Linux/Unix
filesystems are. As well as Pg on Linux. In the case here, though I don't have
FS corruption so I'd like to know what should and could I do in this case.

Suggestions?

Oh and here is the output of a "select *" on one of the corrupt tables...

(saved as draft email here on 9/12/03)

...Ok, I was going to paste that in the email but now the database isn't coming
up at all. Here is the start up message

~~~
DEBUG: FindExec: found "/usr/local/pgsql/bin/postgres" using argv[0]
DEBUG: invoking IpcMemoryCreate (size=1466368)
DEBUG: FindExec: found "/usr/local/pgsql/bin/postmaster" using argv[0]
LOG: database system shutdown was interrupted at 2003-09-16 15:11:36 EDT
LOG: checkpoint record is at 5/2D497FC0
LOG: redo record is at 5/2D497FC0; undo record is at 0/0; shutdown TRUE
LOG: next transaction id: 5287090; next oid: 26471
LOG: database system was not properly shut down; automatic recovery in progress
LOG: ReadRecord: unexpected pageaddr 5/27498000 in log file 5, segment 45,
offset 4816896
LOG: redo is not required
PANIC: XLogWrite: write request 5/2D498000 is past end of log 5/2D498000
DEBUG: reaping dead processes
LOG: startup process (pid 17031) was terminated by signal 6
LOG: aborting startup due to startup process failure
DEBUG: proc_exit(1)
DEBUG: shmem_exit(1)
DEBUG: exit(1)
~~~

Thanks in advance to all

--
Keith C. Perry
Director of Networks & Applications
VCSN, Inc.
http://vcsn.com
_______________ _______________ ______
This email account is being host by:
VCSN, Inc : http://vcsn.com

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 11 '05 #1
7 7910
Network Administrator <ne******@vcsn. com> writes:
PANIC: XLogWrite: write request 5/2D498000 is past end of log 5/2D498000


I'll bet you are running 7.3.3. You need to update to 7.3.4 to escape
this startup bug.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 11 '05 #2
Ahhh, we it just so happens that I have 7.3.4 waiting to be built. I was trying
to wait up 7.4 was released but other than this startup bug does 7.3.4 do a
better job of recovery?

*pause*

Ok, wait- at this point I have to go to 7.3.4 because I would not be able to go
to 7.4 'cause I need the dump/restore process that we've been talking about
first... so, I'll do that and when/if I have a problem, I'll repost :)

Quoting Tom Lane <tg*@sss.pgh.pa .us>:
Network Administrator <ne******@vcsn. com> writes:
PANIC: XLogWrite: write request 5/2D498000 is past end of log 5/2D498000


I'll bet you are running 7.3.3. You need to update to 7.3.4 to escape
this startup bug.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

--
Keith C. Perry
Director of Networks & Applications
VCSN, Inc.
http://vcsn.com

_______________ _______________ ______
This email account is being host by:
VCSN, Inc : http://vcsn.com

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 11 '05 #3
Ok, upgrade done and the system is up but when try hit database through the app
(browser front end just does selects) on the debug screen (level 3) I get this...

DEBUG: child process (pid 21248) was terminated by signal 6
LOG: server process (pid 21248) was terminated by signal 6
LOG: terminating any other active server processes
LOG: all server processes terminated; reinitializing shared memory and semaphores
DEBUG: shmem_exit(0)
DEBUG: invoking IpcMemoryCreate (size=1466368)
LOG: database system was interrupted at 2003-09-17 10:52:16 EDT
LOG: checkpoint record is at 5/2D498110
LOG: redo record is at 5/2D498110; undo record is at 0/0; shutdown TRUE
LOG: next transaction id: 5287090; next oid: 26471
LOG: database system was not properly shut down; automatic recovery in progress
LOG: ReadRecord: record with zero length at 5/2D498150
LOG: redo is not required
LOG: database system is ready
DEBUG: proc_exit(0)
DEBUG: shmem_exit(0)
DEBUG: exit(0)

...if I using the psql client, and issue a "select * from <corrupt table name>
limit 5" if get this...
PANIC: read of clog file 5, offset 16384 failed: Success
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!#

...I've never seen the prompt got to "!#"
So back to my original question. What are the recover procedures (if any) that
should be tried before I grab my PGDATA path from tape?
Quoting Tom Lane <tg*@sss.pgh.pa .us>:
Network Administrator <ne******@vcsn. com> writes:
PANIC: XLogWrite: write request 5/2D498000 is past end of log 5/2D498000


I'll bet you are running 7.3.3. You need to update to 7.3.4 to escape
this startup bug.

regards, tom lane

--
Keith C. Perry
Director of Networks & Applications
VCSN, Inc.
http://vcsn.com

_______________ _______________ ______
This email account is being host by:
VCSN, Inc : http://vcsn.com

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 11 '05 #4
Network Administrator <ne******@vcsn. com> writes:
..if I using the psql client, and issue a "select * from <corrupt table name>
limit 5" if get this... PANIC: read of clog file 5, offset 16384 failed: Success
Hm, not good :-(. What files actually exist in $PGDATA/pg_clog/ (names
and sizes)?
So back to my original question. What are the recover procedures (if
any) that should be tried before I grab my PGDATA path from tape?


You may be up against having to do that, but some investigation first
seems called for.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 11 '05 #5
Quoting Tom Lane <tg*@sss.pgh.pa .us>:
Network Administrator <ne******@vcsn. com> writes:
..if I using the psql client, and issue a "select * from <corrupt table name>
limit 5" if get this...

PANIC: read of clog file 5, offset 16384 failed: Success


Hm, not good :-(. What files actually exist in $PGDATA/pg_clog/ (names
and sizes)?


Here you go...

-rw------- 1 postgres users 262144 Jul 20 15:53 0000
-rw------- 1 postgres users 262144 Jul 31 12:57 0001
-rw------- 1 postgres users 262144 Aug 12 17:32 0002
-rw------- 1 postgres users 262144 Aug 26 00:15 0003
-rw------- 1 postgres users 262144 Sep 9 23:44 0004
-rw------- 1 postgres users 16384 Sep 10 21:21 0005

So back to my original question. What are the recover procedures (if
any) that should be tried before I grab my PGDATA path from tape?


You may be up against having to do that, but some investigation first
seems called for.


Yep, its ready to go. When and if nothing surgical can be done.
regards, tom lane

--
Keith C. Perry
Director of Networks & Applications
VCSN, Inc.
http://vcsn.com

_______________ _______________ ______
This email account is being host by:
VCSN, Inc : http://vcsn.com

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 11 '05 #6
Network Administrator <ne******@vcsn. com> writes:
PANIC: read of clog file 5, offset 16384 failed: Success
Hm, not good :-(. What files actually exist in $PGDATA/pg_clog/ (names
and sizes)?
-rw------- 1 postgres users 262144 Jul 20 15:53 0000
-rw------- 1 postgres users 262144 Jul 31 12:57 0001
-rw------- 1 postgres users 262144 Aug 12 17:32 0002
-rw------- 1 postgres users 262144 Aug 26 00:15 0003
-rw------- 1 postgres users 262144 Sep 9 23:44 0004
-rw------- 1 postgres users 16384 Sep 10 21:21 0005


Okay, it's trying to read off the end of the clog, no doubt looking for
a transaction number just slightly larger than what's known to clog.
This probably indicates more serious problems (because WAL replay really
should have prevented such an inconsistency), but you can get past the
immediate panic relatively easily: just append an 8k page of zeroes to
clog. Assuming your system has /dev/zero, something like this should
do it:

dd bs=8k count=1 < /dev/zero >> $PGDATA/pg_clog/0005

(do this with care of course, and you should probably shut down the
postmaster first). You might possibly have to add more than one page,
if you then get similar PANICs with larger offsets, but try one page
for starters.

If this does suppress the failure messages, you are still not really out
of the woods; you should do what you can to check for data consistency.
A paranoid person would probably take a complete pg_dump and try to diff
it against the last known good dump. At the very least, I'd treat the
table involved in the problem with great suspicion.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 11 '05 #7
Quoting Tom Lane <tg*@sss.pgh.pa .us>:
Network Administrator <ne******@vcsn. com> writes:
PANIC: read of clog file 5, offset 16384 failed: Success Hm, not good :-(. What files actually exist in $PGDATA/pg_clog/ (names
and sizes)?
-rw------- 1 postgres users 262144 Jul 20 15:53 0000
-rw------- 1 postgres users 262144 Jul 31 12:57 0001
-rw------- 1 postgres users 262144 Aug 12 17:32 0002
-rw------- 1 postgres users 262144 Aug 26 00:15 0003
-rw------- 1 postgres users 262144 Sep 9 23:44 0004
-rw------- 1 postgres users 16384 Sep 10 21:21 0005


Okay, it's trying to read off the end of the clog, no doubt looking for
a transaction number just slightly larger than what's known to clog.
This probably indicates more serious problems (because WAL replay really
should have prevented such an inconsistency), but you can get past the
immediate panic relatively easily: just append an 8k page of zeroes to
clog. Assuming your system has /dev/zero, something like this should
do it:

dd bs=8k count=1 < /dev/zero >> $PGDATA/pg_clog/0005

(do this with care of course, and you should probably shut down the
postmaster first). You might possibly have to add more than one page,
if you then get similar PANICs with larger offsets, but try one page
for starters.


Well whatdoyaknow! That did it- EVERYTHING is there! I only needed the one
page. I'm going to have to read up on pg_clog (WAL) so that I understand what
it does a little better.

It makes total sense too because obvious if there is no more space to write too
anything on disk get "frozen" where as application is just going to keep moving
along. At least for a time.
If this does suppress the failure messages, you are still not really out
of the woods; you should do what you can to check for data consistency.
A paranoid person would probably take a complete pg_dump and try to diff
it against the last known good dump. At the very least, I'd treat the
table involved in the problem with great suspicion.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings


Well, I fortunately was on the side of the road and not in the woods in this
case. I'm assuming the next thing to do is run a vacuum analyse and they a dump
all. I'll see how it things perform over the next 48 hours or so.

Now that we've done that, how should should this surgery be documented? I would
think a "tip" like this should be somewhere in the docs (not just the archive).
I'd be more than will to write up this case but I'm trying to establish a long
goal here- if I may be so bold as to suggest one.

Thanks again.

--
Keith C. Perry
Director of Networks & Applications
VCSN, Inc.
http://vcsn.com

_______________ _______________ ______
This email account is being host by:
VCSN, Inc : http://vcsn.com

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postg resql.org

Nov 11 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
9617
by: Edwinah63 | last post by:
Hi Gurus, i am having problems with restoring a ms sql database. i have restored the database using veritas to a different location ('g:\datafiles') in no recover mode. when i view the database through the Enterprise Manager, it shows the database as silver icon (loading).
2
11173
by: Jens Kalkbrenner | last post by:
MS SQL-Server 7.0 Bypassing recovery for database 'EfW_765' because it is marked IN LOAD. What does this mean? Our customer is backing up is maindatabase and is recovering it to this database for testing. Our custumer tries it serveral times and then the recovery works and the data are corrupt. I have written a little programm which does some selects to this database. The program is stopped during recovery but our customer beleves that...
4
11613
by: serge | last post by:
Running SQL Server 2000 Enterprise Edition SP3. The database is also used by Microsoft Project Server 2002 and also has OLAP views, so the database is being used to view/run cubes in the Analysis Manager. What is the best way of shrinking the database size and its log file too? Is there an automatic way to do this with a maintenance plan or i have to manually run a SQL statement periodically? What are the best practices?
0
1978
by: Sean Powell | last post by:
I am a SQL Server DBA Manager and my employer (Symantec) will be opening a position in the near future (20-60 days) for my DBA team. Here is the job description: ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Sr Security Infrastructure Adm Calgary, Canada Consulting Job...
2
2109
by: Matt | last post by:
I run SQL Server 2000 and use thier database maintenance plans to backup my databases. My questions is what happens if a change is being made to a database table while a backup is running? Should I be locking the databases before the backup begins? Scenario: 1) Database Plan begins backup at 7:00 PM. 2) At 7:01 PM a web user updates their password while the backup is taking place.
1
2339
by: pginfo | last post by:
Hi, I have had two times problem with pg database corruption. My system: Dual athlon 2.4 1 GB RAM, linux red hat 7.3 reiserFS Also on the system I have cron that every 3 h. make dump and every 24 h.
346
16584
by: rkusenet | last post by:
http://biz.yahoo.com/rc/040526/tech_database_marketshare_1.html Interesting to see that database sales for windows is more than Unix.
4
8385
by: Knokmans | last post by:
Hi, All of this is still a test envinonment, but has to go on production.... I restore a database a database the follwing way db2 restore db <database> from <directory. taken at <timestamp> replace existing Then i only rollforward the (minumum) logs which are needed for recovery db2 rollforward db <database> to end of logs and complete The database in NOT in any pending state anymore
2
5420
by: Scott Jones | last post by:
Hello, In an attempt to save disk space, I went in and gzipped all of the log files for my database running DB2 version 7.2 . This resulted in the database being marked bad because a log file was no longer found (I think). Here is an excerpt from db2diag.log: String Title:sqleMarkDBad: PID:7980 Node:000
0
9001
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8839
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
1
9344
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8264
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6081
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4893
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3327
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2807
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2226
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.