473,856 Members | 1,720 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

How to determine a database is intact?

Wes
On a nightly basis, we shut the database down and do a file system backup.

A short chronology of our database problem:

8/21 - count(*) of user tables succeeded (done once a week to get
statistics)

8/23 - A specific search on a specific value (one out of over 2 million)
caused postmaster to SEGV. I dropped the index in question and rebuilt it.
All appeared ok.

8/28 - count(*) failed - postmaster aborted on a SEGV. I attempted a
pg_dumpall. Postmaster aborted with SEGV about 240 MB into the dump (the
database is about 50 GB).

I reloaded 8/25 database and attempted a pg_dumpall. Same failure.

I reloaded from the 8/21 file system dump. Pg_dumpall worked on this one.
I spent the next almost 48 hours dumping and reloading 240 million rows and
reprocessing the several million additions since 8/21. I didn't dare use
the 8/21 database without reloading because I didn't know if it was good or
not.
Question:

How can we tell that a database is intact? In the above example, pg_dumpall
worked on the 8/21 database. Did it become corrupt between 8/21 and 8/23,
or was it already corrupt and got worse? Pg_dumpall tells you nothing about
the condition of indexes. Could a corrupt index corrupt data blocks?

I'm looking at doing a pg_dumpall on a weekly basis so that we have a point
in time where we know we have a recoverable database. When the database
reaches several hundred GB and over over a billion rows, this isn't a great
solution, and doesn't address the overall database integrity.

Back to the original question... How can I verify the complete integrity of
a database - especially a very large one where a reload or full index
rebuild could take on the order of days?

Wes
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 23 '05 #1
30 3429
Hmm. I do a nightly dump of our production database, archive a copy
offsite, and verify the quality of the dump by running a little
verification script that is little more than a restore.

But if it would take you more than a day to do that, I'm not sure.

-tfo

On Sep 2, 2004, at 3:30 PM, Wes wrote:
Back to the original question... How can I verify the complete
integrity of
a database - especially a very large one where a reload or full index
rebuild could take on the order of days?

Wes

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 23 '05 #2
Wes wrote:
On a nightly basis, we shut the database down and do a file system backup.

A short chronology of our database problem: [snip] Question:

How can we tell that a database is intact? In the above example, pg_dumpall
worked on the 8/21 database. Did it become corrupt between 8/21 and 8/23,
or was it already corrupt and got worse? Pg_dumpall tells you nothing about
the condition of indexes. Could a corrupt index corrupt data blocks?

I'm looking at doing a pg_dumpall on a weekly basis so that we have a point
in time where we know we have a recoverable database. When the database
reaches several hundred GB and over over a billion rows, this isn't a great
solution, and doesn't address the overall database integrity.

Back to the original question... How can I verify the complete integrity of
a database - especially a very large one where a reload or full index
rebuild could take on the order of days?


You shouldn't have to verify anything. PG's job is to never corrupt your
data, and providing your hardware is good it should do so. If you are
getting problems almost daily that would suggest a RAM/disk problem to
me (sig 11 usually implies RAM). Can't guarantee it's not PG but it's
record of reliability is pretty good.

Steps I'd take:
1. Check your version number against the release notes and see if you
should upgrade. You don't mention your version, but it's always worth
having the last dot-release (7.2.5, 7.3.7, 7.4.5)
2. Schedule time to run memory/disk tests against your hardware. Finding
48 hours might not be easy, but you need to know where you stand.
3. Setup slony or some other replication so I can schedule my downtime.

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 23 '05 #3
Wes
On 9/3/04 3:11 AM, "Richard Huxton" <de*@archonet.c om> wrote:
You shouldn't have to verify anything. PG's job is to never corrupt your
data, and providing your hardware is good it should do so. If you are
getting problems almost daily that would suggest a RAM/disk problem to
me (sig 11 usually implies RAM). Can't guarantee it's not PG but it's
record of reliability is pretty good.
I believe SEGV typically just indicates it de-referenced a bad pointer (i.e.
NULL or out of range). The problem is not occurring on a daily basis. The
database has been in service since December of last year. It's just that
the symptoms progressed from no apparent symptoms, to a clearly corrupt DB.
My guess is that some minor corruption fed upon itself until the DB couldn't
even be dumped.
Steps I'd take:
1. Check your version number against the release notes and see if you
should upgrade. You don't mention your version, but it's always worth
having the last dot-release (7.2.5, 7.3.7, 7.4.5)
2. Schedule time to run memory/disk tests against your hardware. Finding
48 hours might not be easy, but you need to know where you stand.
3. Setup slony or some other replication so I can schedule my downtime.


I thought I mentioned the level in my original mail - 7.4.1. We are
planning on running some diagnostics.

Whether there is a bug in PostgreSQL, or there was a memory hit, or whatever
doesn't really matter to the original question. The database can become
corrupt. How can I tell that a database is fully intact at any given point
in time? If I reload from a system backup before the known corruption, how
can I be sure that the original corruption that precipitated the failure is
not still there and will again rear its ugly head?

Wes
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to ma*******@postg resql.org)

Nov 23 '05 #4
On 9/3/2004 10:59 AM, Wes wrote:
On 9/3/04 3:11 AM, "Richard Huxton" <de*@archonet.c om> wrote:
You shouldn't have to verify anything. PG's job is to never corrupt your
data, and providing your hardware is good it should do so. If you are
getting problems almost daily that would suggest a RAM/disk problem to
me (sig 11 usually implies RAM). Can't guarantee it's not PG but it's
record of reliability is pretty good.
I believe SEGV typically just indicates it de-referenced a bad pointer (i.e.
NULL or out of range). The problem is not occurring on a daily basis. The
database has been in service since December of last year. It's just that
the symptoms progressed from no apparent symptoms, to a clearly corrupt DB.
My guess is that some minor corruption fed upon itself until the DB couldn't
even be dumped.


Right, that's what a SIGSEGV is. And the usual reason for the bad value
in that pointer is bad memory. What do you base your guess of a self
multiplying corruption on? Or is this pure handwaving in self-defense?
Steps I'd take:
1. Check your version number against the release notes and see if you
should upgrade. You don't mention your version, but it's always worth
having the last dot-release (7.2.5, 7.3.7, 7.4.5)
2. Schedule time to run memory/disk tests against your hardware. Finding
48 hours might not be easy, but you need to know where you stand.
3. Setup slony or some other replication so I can schedule my downtime.
I thought I mentioned the level in my original mail - 7.4.1. We are
planning on running some diagnostics.


So you are running a Release that had 4 official bugfix releases from
the vendor on hardware that is in an unknown condition? Is the server at
least configured with ECC Ram, or is the data not important enough to
justify for quality hardware?

Whether there is a bug in PostgreSQL, or there was a memory hit, or whatever
doesn't really matter to the original question. The database can become
corrupt. How can I tell that a database is fully intact at any given point
in time? If I reload from a system backup before the known corruption, how
can I be sure that the original corruption that precipitated the failure is
not still there and will again rear its ugly head?


Dump and restore. You don't need to restore onto the same server. Any
test system with enough disk space would do.
Jan

--
#============== =============== =============== =============== ===========#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#============== =============== =============== ====== Ja******@Yahoo. com #

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postg resql.org

Nov 23 '05 #5
Wes
On 9/4/04 5:53 AM, "Jan Wieck" <Ja******@yahoo .com> wrote:
[snip]
Well, I had a big long response composed to your snide remarks, but decided
I'm not getting in a flame war with you on something that is irrelevant to
the question I posed.
Is the server at
least configured with ECC Ram, or is the data not important enough to
justify for quality hardware?


As a matter of fact it does have ECC, and no errors have been reported by
the system or diagnostics. It's a Dell 6650. No disk errors have been
reported either (hardware RAID controller).

As it pertains to the question at hand, I don't care what caused the
corruption. It's totally irrelevant to the question. Perfect hardware and
perfect software don't exist. There will be hardware failures. There will
be software failures. The question was whether or not there is a feasible
way of determining at any given point in time that a database is fully
intact. We are already now doing weekly pg_dumpall's. Doing a restore of
each of those simply isn't reasonable. If there is no such integrity
utility to scan the DB in place, then that's the way it is. That's the
answer to my question. But, quit trying blame crappy hardware.

Wes
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postg resql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #6
On 9/4/2004 2:26 PM, Wes wrote:
On 9/4/04 5:53 AM, "Jan Wieck" <Ja******@yahoo .com> wrote:

intact. We are already now doing weekly pg_dumpall's. Doing a restore of
each of those simply isn't reasonable. If there is no such integrity


Why isn't doing a restore of those reasonable?
Jan

--
#============== =============== =============== =============== ===========#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#============== =============== =============== ====== Ja******@Yahoo. com #

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 23 '05 #7
Wes
On 9/4/04 2:42 PM, "Jan Wieck" <Ja******@yahoo .com> wrote:
Why isn't doing a restore of those reasonable?


Because of the size and time required. Right now, it takes at least 24
hours, with a full hardware configuration (multi-CPU, 8 disk SCSI RAID,
etc). That is going to do nothing but increase. Extrapolating linearly the
*current* load, it will take at least 4 days to load when the database size
peaks. But, based on past experience, the load will increase significantly
before then (probably by a factor of 2-4 to be conservative). When the
database gets that large, I have to consider that the reload time may not
increase linearly with the size. If we do a pg_dumpall once a week, it will
take longer to do a reload than the period between dumps. Just the pg_dump
alone could easily take close to a day. It also requires we have duplicate
fully configured hardware for each copy of the database we run just to
verify a pg_dumpall - if it takes that long to load, I can't share hardware.
Add to that the people time to monitor the systems and the process of 2x the
hardware... In short, I need a solution that scales to huge databases
(hundreds of gigabytes to over a terabyte), not one that works just for
small to medium databases.

pg_dumpall is hopefully reliable will presumably give me a snapshot that I
know I can restore from if the database becomes hopelessly corrupt. But I
can't individually test each one. I was hoping for a utility that would go
through and verify all indexes and data are consistent, and if not, attempt
to correct them.

As for your earlier question of cascading errors, consider a file system - a
type of database. If you get a file system error and correct it quickly,
you usually will lose nothing. If, however, you ignore that error, it is
likely to get worse over days or weeks. Other errors will crop up as a
result of the bad information in the first one. At some point, the file
system corruption may become so bad that it can't be recovered. Format and
reload. I have seen this on NTFS, UFS, HFS/HFS+, and even ReiserFS.
Journaling greatly reduces, but doesn't eliminate, this problem. There are
tools that will scan your file system and guarantee it's integrity, or fix
the errors (or attempt to fix them) if it finds any. I was looking for
something similar for a Postgres database.

Wes
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 23 '05 #8
Hi,

Am Sa, den 04.09.2004 schrieb Wes um 22:51:
On 9/4/04 2:42 PM, "Jan Wieck" <Ja******@yahoo .com> wrote:
Why isn't doing a restore of those reasonable?
Because of the size and time required. Right now, it takes at least 24
hours, with a full hardware configuration (multi-CPU, 8 disk SCSI RAID,
etc). That is going to do nothing but increase. Extrapolating linearly the
*current* load, it will take at least 4 days to load when the database size
peaks.

....
As for your earlier question of cascading errors, consider a file system - a
type of database. If you get a file system error and correct it quickly,
you usually will lose nothing. If, however, you ignore that error, it is
likely to get worse over days or weeks. Other errors will crop up as a
result of the bad information in the first one. At some point, the file
system corruption may become so bad that it can't be recovered. Format and
reload. I have seen this on NTFS, UFS, HFS/HFS+, and even ReiserFS.
Journaling greatly reduces, but doesn't eliminate, this problem. There are
tools that will scan your file system and guarantee it's integrity, or fix
the errors (or attempt to fix them) if it finds any. I was looking for
something similar for a Postgres database.


Well, with such a huge database you probably should consider
different backup strategies, a filesystem with snapshot
support (XFS?) could help where you can copy a state of the database
at any time - so you can backup the database cluster without
stopping the postmaster. Also replication via slony could be
an option.

The best tool to verify the backup is probably the postmaster
itself. I really doubt any other program would be smaller and
faster :)

(Filesystems provide a tool because the actual filesystem code
is a kernel module)

Regards
Tino
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to ma*******@postg resql.org)

Nov 23 '05 #9
Wes
On 9/4/04 5:28 PM, "Tino Wildenhain" <ti**@wildenhai n.de> wrote:
Well, with such a huge database you probably should consider
different backup strategies, a filesystem with snapshot
support (XFS?) could help where you can copy a state of the database
at any time - so you can backup the database cluster without
stopping the postmaster. Also replication via slony could be
an option.
Yes, we are looking into using file system snapshots. We are currently
using primarily file system backups (shut down the DB, back up the file
system). The problem we ran into was that we didn't have a specific point
in time where we knew with absolute certainty the backed up database was
good - snapshots would not help here.

I ended up starting with a recent backup, and working backwards until I
found one that wouldn't crash postmaster on a pg_dumpall. Rather than trust
that there was no corruption in that version (data blocks might be good, but
pg_dumpall doesn't test indexes), I did a pg_dumpall and reload.
The best tool to verify the backup is probably the postmaster
itself. I really doubt any other program would be smaller and
faster :)


Not really... Postmaster won't tell you if a structure is bad until it
stumbles on it and crashes (or politely reports an error). Just because
postmaster comes up doesn't mean your database is good. As far as I know,
there is no "verify database" command option on postmaster - postmaster
won't fsck your database.

Wes
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
4171
by: Ben M. | last post by:
Greetings all, This should be an easy task, and Im sure it is, but as many times as I have tried, I cant seem to get this to work properly. We changed ISPs recently from a shared host to a co-located server, and our former host was nice enough to send us a backup of our old SQL2000 database (about 5MB). I went into Enterprise Manager, created an empty database with the same name
3
29578
by: Rob Wahmann | last post by:
Is there a way to copy/duplicate and rename a database on the same server? I need to make a copy for another application. I appreciate any tips or advice you can provide! TIA - Rob
8
19212
by: Matt | last post by:
Hi everyone I was wondering if there was a way to disable a single database with having to shutdown mysql or rename tables? Thanks, Matt
2
2959
by: Nate | last post by:
Hello, I am trying to recover a SQL Server 7 database from another hard disk drive that has a corrupted Windows 2000 Advanced Server installation. I am not able to repair the corrupted Windows 2000 Advanced Server installation but the file system is intact. I have installed a new copy of SQL Server 7 onto a new hard disk and have used the sp_attach_db system stored procedure to attach the database from the old hard drive into the new...
3
2096
by: Mark Kolber | last post by:
I just migrated to Office 2003. When I opened an important Access database, Access 2003 promptly destroyed it. Unfortunately, there was a glitch in my backup routine and two folders, including the one containing this database, didn't back up properly. Is there any way to save =any= of it, even just get at the tables? Mark Kolber
2
1849
by: RC | last post by:
I am updating/improving a working database for a non-profit organization. I am thinking of making a copy of the database at the office, bringing the copy to my house, making the changes and then copying the new/improved database back to the hard drive of the PC at the office. While I am working on the copy of the database at home, the database has data being entered into it at the office. My question is: how can I copy the new database...
6
1438
by: frizzle | last post by:
Hi group, I have a site, which content is mostly based on what it finds in a mySQL database. All queries etc. are performed by PHP. Somehow, no matter what section of the site (still in test fase) i access, either small, with little DB info, or bif with big DB info, it seems that when i haven't accessed the page in at least 5 minutes, access it again
3
6230
by: Paul H | last post by:
A client wants me to develop a db that he will sell on to other people. He wants to allow them to run the database for 30 days then lock them out if they chose not to buy the database. The database will be an MDE file. A typical scenario might be: 1. The user trials the database and inputs some data. 2. The trial period ends and the user is locked out. 3. The user pays us and we send him an unlock code so that he can reopen the...
5
2332
by: djhexx | last post by:
Hello. I have an ASP.NET application (C#) that I allow users to upload files. Files get stored in a SQL2005 database. The file data is stored in a varbinary(max) column. When the user uploads the file, I store it in a database. When a user requests to download a file, the file is retrieved from the database and then sent to them. The files go in there ok. This is how I store them:
0
9763
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
11061
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10699
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10787
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
1
7932
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5762
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5960
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4578
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
4176
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.