473,288 Members | 1,704 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,288 software developers and data experts.

How to determine a database is intact?

Wes
On a nightly basis, we shut the database down and do a file system backup.

A short chronology of our database problem:

8/21 - count(*) of user tables succeeded (done once a week to get
statistics)

8/23 - A specific search on a specific value (one out of over 2 million)
caused postmaster to SEGV. I dropped the index in question and rebuilt it.
All appeared ok.

8/28 - count(*) failed - postmaster aborted on a SEGV. I attempted a
pg_dumpall. Postmaster aborted with SEGV about 240 MB into the dump (the
database is about 50 GB).

I reloaded 8/25 database and attempted a pg_dumpall. Same failure.

I reloaded from the 8/21 file system dump. Pg_dumpall worked on this one.
I spent the next almost 48 hours dumping and reloading 240 million rows and
reprocessing the several million additions since 8/21. I didn't dare use
the 8/21 database without reloading because I didn't know if it was good or
not.
Question:

How can we tell that a database is intact? In the above example, pg_dumpall
worked on the 8/21 database. Did it become corrupt between 8/21 and 8/23,
or was it already corrupt and got worse? Pg_dumpall tells you nothing about
the condition of indexes. Could a corrupt index corrupt data blocks?

I'm looking at doing a pg_dumpall on a weekly basis so that we have a point
in time where we know we have a recoverable database. When the database
reaches several hundred GB and over over a billion rows, this isn't a great
solution, and doesn't address the overall database integrity.

Back to the original question... How can I verify the complete integrity of
a database - especially a very large one where a reload or full index
rebuild could take on the order of days?

Wes
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 23 '05 #1
30 3351
Hmm. I do a nightly dump of our production database, archive a copy
offsite, and verify the quality of the dump by running a little
verification script that is little more than a restore.

But if it would take you more than a day to do that, I'm not sure.

-tfo

On Sep 2, 2004, at 3:30 PM, Wes wrote:
Back to the original question... How can I verify the complete
integrity of
a database - especially a very large one where a reload or full index
rebuild could take on the order of days?

Wes

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 23 '05 #2
Wes wrote:
On a nightly basis, we shut the database down and do a file system backup.

A short chronology of our database problem: [snip] Question:

How can we tell that a database is intact? In the above example, pg_dumpall
worked on the 8/21 database. Did it become corrupt between 8/21 and 8/23,
or was it already corrupt and got worse? Pg_dumpall tells you nothing about
the condition of indexes. Could a corrupt index corrupt data blocks?

I'm looking at doing a pg_dumpall on a weekly basis so that we have a point
in time where we know we have a recoverable database. When the database
reaches several hundred GB and over over a billion rows, this isn't a great
solution, and doesn't address the overall database integrity.

Back to the original question... How can I verify the complete integrity of
a database - especially a very large one where a reload or full index
rebuild could take on the order of days?


You shouldn't have to verify anything. PG's job is to never corrupt your
data, and providing your hardware is good it should do so. If you are
getting problems almost daily that would suggest a RAM/disk problem to
me (sig 11 usually implies RAM). Can't guarantee it's not PG but it's
record of reliability is pretty good.

Steps I'd take:
1. Check your version number against the release notes and see if you
should upgrade. You don't mention your version, but it's always worth
having the last dot-release (7.2.5, 7.3.7, 7.4.5)
2. Schedule time to run memory/disk tests against your hardware. Finding
48 hours might not be easy, but you need to know where you stand.
3. Setup slony or some other replication so I can schedule my downtime.

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 23 '05 #3
Wes
On 9/3/04 3:11 AM, "Richard Huxton" <de*@archonet.com> wrote:
You shouldn't have to verify anything. PG's job is to never corrupt your
data, and providing your hardware is good it should do so. If you are
getting problems almost daily that would suggest a RAM/disk problem to
me (sig 11 usually implies RAM). Can't guarantee it's not PG but it's
record of reliability is pretty good.
I believe SEGV typically just indicates it de-referenced a bad pointer (i.e.
NULL or out of range). The problem is not occurring on a daily basis. The
database has been in service since December of last year. It's just that
the symptoms progressed from no apparent symptoms, to a clearly corrupt DB.
My guess is that some minor corruption fed upon itself until the DB couldn't
even be dumped.
Steps I'd take:
1. Check your version number against the release notes and see if you
should upgrade. You don't mention your version, but it's always worth
having the last dot-release (7.2.5, 7.3.7, 7.4.5)
2. Schedule time to run memory/disk tests against your hardware. Finding
48 hours might not be easy, but you need to know where you stand.
3. Setup slony or some other replication so I can schedule my downtime.


I thought I mentioned the level in my original mail - 7.4.1. We are
planning on running some diagnostics.

Whether there is a bug in PostgreSQL, or there was a memory hit, or whatever
doesn't really matter to the original question. The database can become
corrupt. How can I tell that a database is fully intact at any given point
in time? If I reload from a system backup before the known corruption, how
can I be sure that the original corruption that precipitated the failure is
not still there and will again rear its ugly head?

Wes
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #4
On 9/3/2004 10:59 AM, Wes wrote:
On 9/3/04 3:11 AM, "Richard Huxton" <de*@archonet.com> wrote:
You shouldn't have to verify anything. PG's job is to never corrupt your
data, and providing your hardware is good it should do so. If you are
getting problems almost daily that would suggest a RAM/disk problem to
me (sig 11 usually implies RAM). Can't guarantee it's not PG but it's
record of reliability is pretty good.
I believe SEGV typically just indicates it de-referenced a bad pointer (i.e.
NULL or out of range). The problem is not occurring on a daily basis. The
database has been in service since December of last year. It's just that
the symptoms progressed from no apparent symptoms, to a clearly corrupt DB.
My guess is that some minor corruption fed upon itself until the DB couldn't
even be dumped.


Right, that's what a SIGSEGV is. And the usual reason for the bad value
in that pointer is bad memory. What do you base your guess of a self
multiplying corruption on? Or is this pure handwaving in self-defense?
Steps I'd take:
1. Check your version number against the release notes and see if you
should upgrade. You don't mention your version, but it's always worth
having the last dot-release (7.2.5, 7.3.7, 7.4.5)
2. Schedule time to run memory/disk tests against your hardware. Finding
48 hours might not be easy, but you need to know where you stand.
3. Setup slony or some other replication so I can schedule my downtime.
I thought I mentioned the level in my original mail - 7.4.1. We are
planning on running some diagnostics.


So you are running a Release that had 4 official bugfix releases from
the vendor on hardware that is in an unknown condition? Is the server at
least configured with ECC Ram, or is the data not important enough to
justify for quality hardware?

Whether there is a bug in PostgreSQL, or there was a memory hit, or whatever
doesn't really matter to the original question. The database can become
corrupt. How can I tell that a database is fully intact at any given point
in time? If I reload from a system backup before the known corruption, how
can I be sure that the original corruption that precipitated the failure is
not still there and will again rear its ugly head?


Dump and restore. You don't need to restore onto the same server. Any
test system with enough disk space would do.
Jan

--
#================================================= =====================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================= = Ja******@Yahoo.com #

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 23 '05 #5
Wes
On 9/4/04 5:53 AM, "Jan Wieck" <Ja******@yahoo.com> wrote:
[snip]
Well, I had a big long response composed to your snide remarks, but decided
I'm not getting in a flame war with you on something that is irrelevant to
the question I posed.
Is the server at
least configured with ECC Ram, or is the data not important enough to
justify for quality hardware?


As a matter of fact it does have ECC, and no errors have been reported by
the system or diagnostics. It's a Dell 6650. No disk errors have been
reported either (hardware RAID controller).

As it pertains to the question at hand, I don't care what caused the
corruption. It's totally irrelevant to the question. Perfect hardware and
perfect software don't exist. There will be hardware failures. There will
be software failures. The question was whether or not there is a feasible
way of determining at any given point in time that a database is fully
intact. We are already now doing weekly pg_dumpall's. Doing a restore of
each of those simply isn't reasonable. If there is no such integrity
utility to scan the DB in place, then that's the way it is. That's the
answer to my question. But, quit trying blame crappy hardware.

Wes
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #6
On 9/4/2004 2:26 PM, Wes wrote:
On 9/4/04 5:53 AM, "Jan Wieck" <Ja******@yahoo.com> wrote:

intact. We are already now doing weekly pg_dumpall's. Doing a restore of
each of those simply isn't reasonable. If there is no such integrity


Why isn't doing a restore of those reasonable?
Jan

--
#================================================= =====================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================= = Ja******@Yahoo.com #

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 23 '05 #7
Wes
On 9/4/04 2:42 PM, "Jan Wieck" <Ja******@yahoo.com> wrote:
Why isn't doing a restore of those reasonable?


Because of the size and time required. Right now, it takes at least 24
hours, with a full hardware configuration (multi-CPU, 8 disk SCSI RAID,
etc). That is going to do nothing but increase. Extrapolating linearly the
*current* load, it will take at least 4 days to load when the database size
peaks. But, based on past experience, the load will increase significantly
before then (probably by a factor of 2-4 to be conservative). When the
database gets that large, I have to consider that the reload time may not
increase linearly with the size. If we do a pg_dumpall once a week, it will
take longer to do a reload than the period between dumps. Just the pg_dump
alone could easily take close to a day. It also requires we have duplicate
fully configured hardware for each copy of the database we run just to
verify a pg_dumpall - if it takes that long to load, I can't share hardware.
Add to that the people time to monitor the systems and the process of 2x the
hardware... In short, I need a solution that scales to huge databases
(hundreds of gigabytes to over a terabyte), not one that works just for
small to medium databases.

pg_dumpall is hopefully reliable will presumably give me a snapshot that I
know I can restore from if the database becomes hopelessly corrupt. But I
can't individually test each one. I was hoping for a utility that would go
through and verify all indexes and data are consistent, and if not, attempt
to correct them.

As for your earlier question of cascading errors, consider a file system - a
type of database. If you get a file system error and correct it quickly,
you usually will lose nothing. If, however, you ignore that error, it is
likely to get worse over days or weeks. Other errors will crop up as a
result of the bad information in the first one. At some point, the file
system corruption may become so bad that it can't be recovered. Format and
reload. I have seen this on NTFS, UFS, HFS/HFS+, and even ReiserFS.
Journaling greatly reduces, but doesn't eliminate, this problem. There are
tools that will scan your file system and guarantee it's integrity, or fix
the errors (or attempt to fix them) if it finds any. I was looking for
something similar for a Postgres database.

Wes
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 23 '05 #8
Hi,

Am Sa, den 04.09.2004 schrieb Wes um 22:51:
On 9/4/04 2:42 PM, "Jan Wieck" <Ja******@yahoo.com> wrote:
Why isn't doing a restore of those reasonable?
Because of the size and time required. Right now, it takes at least 24
hours, with a full hardware configuration (multi-CPU, 8 disk SCSI RAID,
etc). That is going to do nothing but increase. Extrapolating linearly the
*current* load, it will take at least 4 days to load when the database size
peaks.

....
As for your earlier question of cascading errors, consider a file system - a
type of database. If you get a file system error and correct it quickly,
you usually will lose nothing. If, however, you ignore that error, it is
likely to get worse over days or weeks. Other errors will crop up as a
result of the bad information in the first one. At some point, the file
system corruption may become so bad that it can't be recovered. Format and
reload. I have seen this on NTFS, UFS, HFS/HFS+, and even ReiserFS.
Journaling greatly reduces, but doesn't eliminate, this problem. There are
tools that will scan your file system and guarantee it's integrity, or fix
the errors (or attempt to fix them) if it finds any. I was looking for
something similar for a Postgres database.


Well, with such a huge database you probably should consider
different backup strategies, a filesystem with snapshot
support (XFS?) could help where you can copy a state of the database
at any time - so you can backup the database cluster without
stopping the postmaster. Also replication via slony could be
an option.

The best tool to verify the backup is probably the postmaster
itself. I really doubt any other program would be smaller and
faster :)

(Filesystems provide a tool because the actual filesystem code
is a kernel module)

Regards
Tino
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #9
Wes
On 9/4/04 5:28 PM, "Tino Wildenhain" <ti**@wildenhain.de> wrote:
Well, with such a huge database you probably should consider
different backup strategies, a filesystem with snapshot
support (XFS?) could help where you can copy a state of the database
at any time - so you can backup the database cluster without
stopping the postmaster. Also replication via slony could be
an option.
Yes, we are looking into using file system snapshots. We are currently
using primarily file system backups (shut down the DB, back up the file
system). The problem we ran into was that we didn't have a specific point
in time where we knew with absolute certainty the backed up database was
good - snapshots would not help here.

I ended up starting with a recent backup, and working backwards until I
found one that wouldn't crash postmaster on a pg_dumpall. Rather than trust
that there was no corruption in that version (data blocks might be good, but
pg_dumpall doesn't test indexes), I did a pg_dumpall and reload.
The best tool to verify the backup is probably the postmaster
itself. I really doubt any other program would be smaller and
faster :)


Not really... Postmaster won't tell you if a structure is bad until it
stumbles on it and crashes (or politely reports an error). Just because
postmaster comes up doesn't mean your database is good. As far as I know,
there is no "verify database" command option on postmaster - postmaster
won't fsck your database.

Wes
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #10
> > Why isn't doing a restore of those reasonable?

Because of the size and time required. Right now, it takes at least 24
hours, with a full hardware configuration (multi-CPU, 8 disk SCSI RAID,
etc). That is going to do nothing but increase.


It seems to me then that you need a better primary solution.
--
Mike Nolan

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 23 '05 #11
> I was looking for
something similar for a Postgres database.


There was this:

<http://svana.org/kleptog/pgsql/pgfsck.html>

Never used it, can't comment on it, doesn't appear to have been updated for
7.4. But it might be a jumping-off point.
--
Scott Ribe
sc********@killerbytes.com
http://www.killerbytes.com/
(303) 665-7007 voice
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 23 '05 #12
Why isn't doing a restore of those reasonable?


Because of the size and time required. Right now, it takes at least 24
hours, with a full hardware configuration (multi-CPU, 8 disk SCSI RAID,
etc). That is going to do nothing but increase. Extrapolating linearly
the


I understand you restore to a test server... can you configure your test
server with no fsync, pg will have only one connection so you can bump
sort_mem, and the like ? wil the times will be reasonable then ?

Doesn't a successful pg_dumpall prove the database is intact ?

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 23 '05 #13
On 9/4/2004 4:51 PM, Wes wrote:
On 9/4/04 2:42 PM, "Jan Wieck" <Ja******@yahoo.com> wrote:
Why isn't doing a restore of those reasonable?


Because of the size and time required. Right now, it takes at least 24
hours, with a full hardware configuration (multi-CPU, 8 disk SCSI RAID,


24 hours to do what? The pg_dumpall, the restore or both? And what media
is used to save the dump? Same disk array, tape, separate disk array on
separate controller or separate archive server? What is the bottleneck?

Planning the backup solution out of the blue, I would have a separate
archive system. This system is connected via gigabit ethernet and has a
storage array large enough to hold at least 2 full dumps. It also has
the tape drive(s) to write off the dumps from the array for storage in
an alternate location. Preferrably, I would have a little more time to
finish the still missing features in Slony-I and take the dumps off a
replica :-)

Since another purpose is to verify if a dump is consistent, the archive
system would have another simple stripe set that can hold the entire
database, plus a ramdisk holding the xlog.

PostgreSQL will most probably not have any kind of MyISAMcheck or
however that utility is called. IIRC RedHat has something that can check
offline for the integrity of a single heap. But I don't know if it is
still under development or if they realized that this is a dead end
because of the offline limitation.
Jan

--
#================================================= =====================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================= = Ja******@Yahoo.com #

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 23 '05 #14
> Doesn't a successful pg_dumpall prove the database is intact ?

It is very difficult to prove a negative, that there are no errors.
What a successful pg_dumpall says is that pg_dumpall didn't find any
errors, not that there were none present.

Also, I thought part of the question was how to determine if the output
from the pg_dumpall was itself intact without doing a complete restore.
--
Mike Nolan

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 23 '05 #15
Pardon me for this, but it's Monday morning here and I'm in shallow
thinking mode.

I was just looking over Wes's questions about checking database
integrity, and the responses given, and I'm wondering if keeping a live
CRC on each record would make sense? Or is this already done, or maybe
an option? (Not that it would in any way solve Wes's company's current
problem, ...)

--
Joel <re**@ddcom.co.jp>
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 23 '05 #16
Wes
On 9/5/04 9:04 AM, "Jan Wieck" <Ja******@yahoo.com> wrote:
24 hours to do what? The pg_dumpall, the restore or both?


There's more than 250 million rows. If I remember right, it's ballpark 25%
data reload, 75% index/foreign constraint rebuild. Pg_dumpall is something
like 3 hours or so.

Wes
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 23 '05 #17
Wes <we****@syntegra.com> writes:
There's more than 250 million rows. If I remember right, it's ballpark 25%
data reload, 75% index/foreign constraint rebuild. Pg_dumpall is something
like 3 hours or so.


FWIW, increasing sort_mem for the reload process would probably help
with the index and FK rebuilds. (8.0 rejiggers things so that the
memory constraints for these operations are driven off a separate
variable, but in 7.* you need to muck about with sort_mem. The
appropriate value for one-off operations is a lot higher than what
you would want multiple competing backends to be using.)

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 23 '05 #18
Wes
On 9/8/04 9:11 AM, "Tom Lane" <tg*@sss.pgh.pa.us> wrote:
FWIW, increasing sort_mem for the reload process would probably help
with the index and FK rebuilds. (8.0 rejiggers things so that the
memory constraints for these operations are driven off a separate
variable, but in 7.* you need to muck about with sort_mem. The
appropriate value for one-off operations is a lot higher than what
you would want multiple competing backends to be using.)


How big are we talking? I've got it set to 65535 right now. Multiply it by
10?

Wes
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 23 '05 #19
Wes <we****@syntegra.com> writes:
On 9/8/04 9:11 AM, "Tom Lane" <tg*@sss.pgh.pa.us> wrote:
FWIW, increasing sort_mem for the reload process would probably help
with the index and FK rebuilds.
How big are we talking? I've got it set to 65535 right now. Multiply it by
10?


How much RAM have you got in the machine? I'd think that you could
afford to put, say, a quarter or so of physical RAM into sort_mem, if
there is nothing much else going on during the data import. 64MB is
not a lot on that scale.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #20
>>>>> "TL" == Tom Lane <tg*@sss.pgh.pa.us> writes:

TL> Wes <we****@syntegra.com> writes:
There's more than 250 million rows. If I remember right, it's ballpark 25%
data reload, 75% index/foreign constraint rebuild. Pg_dumpall is something
like 3 hours or so.


TL> FWIW, increasing sort_mem for the reload process would probably help
TL> with the index and FK rebuilds. (8.0 rejiggers things so that the

I shaved significant time from 7.4.x restores by bumping up the
checkpoint_segments to 50. My dumps currently take about 1.5 hours
over a 100Mbit ethernet (server is on gig ethernet, dumping client is
on 100Mbit).

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D. Khera Communications, Inc.
Internet: kh***@kciLink.com Rockville, MD +1-301-869-4449 x806
AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 23 '05 #21

Believe it or not, we haven't gotten many requests for this feature,
partly because such corruption is so rare. Also, any checker isn't
going to find a change from "Baker" to "Faker" in a text field.

---------------------------------------------------------------------------

Wes wrote:
On 9/4/04 5:28 PM, "Tino Wildenhain" <ti**@wildenhain.de> wrote:
Well, with such a huge database you probably should consider
different backup strategies, a filesystem with snapshot
support (XFS?) could help where you can copy a state of the database
at any time - so you can backup the database cluster without
stopping the postmaster. Also replication via slony could be
an option.


Yes, we are looking into using file system snapshots. We are currently
using primarily file system backups (shut down the DB, back up the file
system). The problem we ran into was that we didn't have a specific point
in time where we knew with absolute certainty the backed up database was
good - snapshots would not help here.

I ended up starting with a recent backup, and working backwards until I
found one that wouldn't crash postmaster on a pg_dumpall. Rather than trust
that there was no corruption in that version (data blocks might be good, but
pg_dumpall doesn't test indexes), I did a pg_dumpall and reload.
The best tool to verify the backup is probably the postmaster
itself. I really doubt any other program would be smaller and
faster :)


Not really... Postmaster won't tell you if a structure is bad until it
stumbles on it and crashes (or politely reports an error). Just because
postmaster comes up doesn't mean your database is good. As far as I know,
there is no "verify database" command option on postmaster - postmaster
won't fsck your database.

Wes
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings


--
Bruce Momjian | http://candle.pha.pa.us
pg***@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 23 '05 #22
Wes
On 9/9/04 11:07 AM, "Bruce Momjian" <pg***@candle.pha.pa.us> wrote:
Believe it or not, we haven't gotten many requests for this feature,
partly because such corruption is so rare. Also, any checker isn't
going to find a change from "Baker" to "Faker" in a text field.


Yep, unless you added a CRC (and accepted the performance hit) to each
record the best you could do is verify that the database is consistent.
That would still be quite valuable, though - all block headers are valid,
indexes don't point out into oblivion, etc.

I expect there are only a handful of huge databases running a heavy load -
the vast majority are probably tens no larger than 10's (maybe hundreds) of
megabytes, or do not experience a heavy update load?

Wes
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #23
Wes
On 9/9/04 9:27 AM, "Vivek Khera" <kh***@kcilink.com> wrote:
I shaved significant time from 7.4.x restores by bumping up the
checkpoint_segments to 50. My dumps currently take about 1.5 hours
over a 100Mbit ethernet (server is on gig ethernet, dumping client is
on 100Mbit).


Mine are already set to 30, but in a recent re-index, I saw warnings in the
log so need to bump it up some more. I'm going to likely be doing a reload
this weekend, so will include this in the run.

Wes
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #24

Bruce Momjian <pg***@candle.pha.pa.us> writes:
Believe it or not, we haven't gotten many requests for this feature,
partly because such corruption is so rare. Also, any checker isn't
going to find a change from "Baker" to "Faker" in a text field.


Well there were people asking for per-block checksums. Given the frequency of
people complaining about hardware problems I'm starting to think they may be a
good idea after all.

But checksums don't detect corruption caused by software failure. That would
require a more of an fsck-like approach. There's nothing wrong with the idea,
but it would be a lot of work and nobody else has felt the need for it.

--
greg
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 23 '05 #25
>>>>> "W" == Wes <we****@syntegra.com> writes:

W> I expect there are only a handful of huge databases running a heavy load -
W> the vast majority are probably tens no larger than 10's (maybe hundreds) of
W> megabytes, or do not experience a heavy update load?

Gigabytes: 40+ currently (old data is pruned)
Updates: jillions and jillions
Inserts: several hundred millions
Years: 3+
Corruptions: 0
Downtime: only for planned OS and Pg upgrades (7.1 -> 7.2 -> 7.4)
OS: FreeBSD 4.x
Hardware: Dell PowerEdge rackmount with SCSI hardware RAID

I sleep well at night :-)

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D. Khera Communications, Inc.
Internet: kh***@kciLink.com Rockville, MD +1-301-869-4449 x806
AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 23 '05 #26
Now THAT"S what I like to hear.....
I too am on FreeBSD....Dell PowerEdge SCSI Hardware RAID....

I too will now sleep well at night!!

Vivek Khera wrote:
>>"W" == Wes <we****@syntegra.com> writes:
>>
>>


W> I expect there are only a handful of huge databases running a heavy load -
W> the vast majority are probably tens no larger than 10's (maybe hundreds) of
W> megabytes, or do not experience a heavy update load?

Gigabytes: 40+ currently (old data is pruned)
Updates: jillions and jillions
Inserts: several hundred millions
Years: 3+
Corruptions: 0
Downtime: only for planned OS and Pg upgrades (7.1 -> 7.2 -> 7.4)
OS: FreeBSD 4.x
Hardware: Dell PowerEdge rackmount with SCSI hardware RAID

I sleep well at night :-)

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #27
On Thu, Sep 09, 2004 at 12:30:31PM -0500, Wes wrote:
On 9/9/04 11:07 AM, "Bruce Momjian" <pg***@candle.pha.pa.us> wrote:
Believe it or not, we haven't gotten many requests for this feature,
partly because such corruption is so rare. Also, any checker isn't
going to find a change from "Baker" to "Faker" in a text field.


Yep, unless you added a CRC (and accepted the performance hit) to each
record the best you could do is verify that the database is consistent.
That would still be quite valuable, though - all block headers are valid,
indexes don't point out into oblivion, etc.

I expect there are only a handful of huge databases running a heavy load -
the vast majority are probably tens no larger than 10's (maybe hundreds) of
megabytes, or do not experience a heavy update load?


I have a CRM-type application installed at a number of customer
sites. Database sizes are in the 5 - 50 gigabyte range, and update
rates are pretty high - hundreds of thousands a day, certainly, probably
over a million a day at some sites.

They've been running continuously for several years, on a mix of
Solaris and Linux systems ranging from big SMP Sun boxes with nice
RAID systems down to dinky little dual-CPU Linux/x86 boxes with a
couple of software mirrored drives.

A lot of the data is of a form where data corruption will be
immediately recognised by the application (has to be one of a few
pre-defined strings, or integers that have to be in a given range,
that sort of thing).

I don't believe that any customer has had database-related downtime
apart from the scheduled, painful 7.2->7.4 upgrade. I've never seen a
problem due to data corruption, despite a number of kernel panics,
power failures and other such facts of life.

And that's with no manual database maintenance at all. Just an
application-specific maintenance daemon that does cleanup, vacuuming
and analysis.

Cheers,
Steve
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 23 '05 #28
Wes
On 9/9/04 3:36 PM, "Vivek Khera" <kh***@kcilink.com> wrote:
Gigabytes: 40+ currently (old data is pruned)
Updates: jillions and jillions
Inserts: several hundred millions
Years: 3+
Corruptions: 0
Downtime: only for planned OS and Pg upgrades (7.1 -> 7.2 -> 7.4)
OS: FreeBSD 4.x
Hardware: Dell PowerEdge rackmount with SCSI hardware RAID

I sleep well at night :-)


That is close to describing ours both size and hardware-wise, although it
will be growing significantly larger than that. I slept well until about a
week ago... The thing had been running for 9 months without a hit until
then.

I did say "most", not "all"... It's good to see specifics on DB's similar
in size to ours.

Wes
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 23 '05 #29
Joel wrote:
Pardon me for this, but it's Monday morning here and I'm in shallow
thinking mode.

I was just looking over Wes's questions about checking database
integrity, and the responses given, and I'm wondering if keeping a live
CRC on each record would make sense? Or is this already done, or maybe
an option? (Not that it would in any way solve Wes's company's current
problem, ...)


TODO has:

* Add optional CRC checksum to heap and index pages

--
Bruce Momjian | http://candle.pha.pa.us
pg***@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #30
On 9/8/04 5:51 PM, "Tom Lane" <tg*@sss.pgh.pa.us> wrote:
How much RAM have you got in the machine? I'd think that you could
afford to put, say, a quarter or so of physical RAM into sort_mem, if
there is nothing much else going on during the data import. 64MB is
not a lot on that scale.


The system currently has 2GB. The system is dedicated to this database, so
for a dedicated load I would think I could set it up to 1GB?

Wes
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #31

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Ben M. | last post by:
Greetings all, This should be an easy task, and Im sure it is, but as many times as I have tried, I cant seem to get this to work properly. We changed ISPs recently from a shared host to a...
3
by: Rob Wahmann | last post by:
Is there a way to copy/duplicate and rename a database on the same server? I need to make a copy for another application. I appreciate any tips or advice you can provide! TIA - Rob
8
by: Matt | last post by:
Hi everyone I was wondering if there was a way to disable a single database with having to shutdown mysql or rename tables? Thanks, Matt
2
by: Nate | last post by:
Hello, I am trying to recover a SQL Server 7 database from another hard disk drive that has a corrupted Windows 2000 Advanced Server installation. I am not able to repair the corrupted Windows...
3
by: Mark Kolber | last post by:
I just migrated to Office 2003. When I opened an important Access database, Access 2003 promptly destroyed it. Unfortunately, there was a glitch in my backup routine and two folders, including...
2
by: RC | last post by:
I am updating/improving a working database for a non-profit organization. I am thinking of making a copy of the database at the office, bringing the copy to my house, making the changes and then...
6
by: frizzle | last post by:
Hi group, I have a site, which content is mostly based on what it finds in a mySQL database. All queries etc. are performed by PHP. Somehow, no matter what section of the site (still in test...
3
by: Paul H | last post by:
A client wants me to develop a db that he will sell on to other people. He wants to allow them to run the database for 30 days then lock them out if they chose not to buy the database. The...
5
by: djhexx | last post by:
Hello. I have an ASP.NET application (C#) that I allow users to upload files. Files get stored in a SQL2005 database. The file data is stored in a varbinary(max) column. When the user uploads...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 7 Feb 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:30 (7.30PM). In this month's session, the creator of the excellent VBE...
0
by: MeoLessi9 | last post by:
I have VirtualBox installed on Windows 11 and now I would like to install Kali on a virtual machine. However, on the official website, I see two options: "Installer images" and "Virtual machines"....
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: Aftab Ahmad | last post by:
Hello Experts! I have written a code in MS Access for a cmd called "WhatsApp Message" to open WhatsApp using that very code but the problem is that it gives a popup message everytime I clicked on...
0
by: Aftab Ahmad | last post by:
So, I have written a code for a cmd called "Send WhatsApp Message" to open and send WhatsApp messaage. The code is given below. Dim IE As Object Set IE =...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.