Postmaster Problem - PostgreSQL Database

Tony O'Bryan

I am normally the admin for our PostgreSQL servers, but someone else tried
to kill a runaway query while I was out sick. Unfortunately, he tried
killing the runaway query by killing the postmaster. Now we are dealing
with a persistent problem that just won't go away.

1) Spontaneous back end deaths.
2) Spontaneous backend disconnects (which are probably because of 1).
3) pg_clog files disappearing.

The first symptom of the problem was a message I received while within psql.
It was something like this: "Some backend process died abnormally, causing
your session to terminate." I could reconnect to the database afterwards.

The second symptom was another use receiving an error message along the
lines of:

"ERROR: could not access status of transaction 97201274
DETAIL: could not open file "/var/lib/pgsql/data/pg_clog/005C": No such
file or directory".

The query was a simple count(*) operation: "select count(*) from
[sometable]".

I read prior Usenet postings for this error and did the following:

1) touch /var/lib/pgsql/data/pg_clog/005C.temp
2) chown postgres.postgres /var/lib/pgsql/data/pg_clog/005C.temp
3) chmod 0600 /var/lib/pgsql/data/pg_clog/005C.temp
4) dd if=/dev/zero of=/var/lib/pgsql/data/pg_clog/005C.temp bs=8192 count=1
5) mv /var/lib/pgsql/data/pg_clog/005C.temp /var/lib/pgsql/data/pg_clog/005C

Then I reloaded PostgreSQL (the prior postings didn't indicated a need to
reload or restart the database, so I chose the least disruptive option):

/etc/init.d/postgresql reload

I tried the query again, and I got an error message saying that PostgreSQL
couldn't read the new 005C file past a certain offset. So I reran the dd
command above, but used a count of 100 instead of 1. The count query then
worked. I crossed my fingers and hoped the problem was solved.

However, the same problem reappeared the next morning. My suspicion was
that PostgreSQL's metadata tables weren't properly updated since I
performed the above operations on a running database. I then brought
PostgreSQL down completely (/etc/init.d/postgresql stop), repeated the
procedure, then restarted the database. Again, the problem seemed to be
fixed.

That was until a user called and reported an error message about a
spontaneous backend disconnection (I viewed the message myself, and it
indeed said that data could not be received from the server). I'm not
certain that was caused by PostgreSQL, because the developer of that
particular program has had long standing problems caused by his own bugs.

I came into work this morning, and tried the same select count(*) query that
I mentioned above, and I got the same file not found error I posted above.
Following the same procedures I outlined above fixed the problem, but now I
know the fix is only temporary.

My questions are:

1) What kind of database damage can I expect to find?

2) I think my daily pg_dumps are succeeding, so is my only real option to
start a new database and restore from a backup?

3) Why is the clog I create disappearing? Is it because of vacuuming?

4) Can this problem be fixed without resorting to creating a new database
cluster? This cluster hosts about a dozen active databases spread across 4
departments.

Mar 2 '06 #1

Subscribe Post Reply

5307

Tony O'Bryan

Tony O'Bryan wrote:

I am normally the admin for our PostgreSQL servers, but someone else tried
to kill a runaway query while I was out sick. Unfortunately, he tried
killing the runaway query by killing the postmaster. Now we are dealing
with a persistent problem that just won't go away.

The only quick solution appears to be copying the database cluster.

1) Create a new cluster at a different location using initdb
2) Disable remote access to the old cluster.
3) Dump the cluster.
4) Start a parallel instance of the postmaster at the new location.
5) Import the dump into the new instance.
6) Shut down both instances.
7) Rename the old instance to an archival area.
8) Rename the new instance to that of the old instance.
9) Restart the database.

The new instance is now fixed of all the problems caused by killing the
postmaster.

Mar 3 '06 #2

Schmidty

Tony O'Bryan wrote:

Tony O'Bryan wrote:

I am normally the admin for our PostgreSQL servers, but someone else tried
to kill a runaway query while I was out sick. Unfortunately, he tried
killing the runaway query by killing the postmaster. Now we are dealing
with a persistent problem that just won't go away.

The only quick solution appears to be copying the database cluster.

1) Create a new cluster at a different location using initdb
2) Disable remote access to the old cluster.
3) Dump the cluster.
4) Start a parallel instance of the postmaster at the new location.
5) Import the dump into the new instance.
6) Shut down both instances.
7) Rename the old instance to an archival area.
8) Rename the new instance to that of the old instance.
9) Restart the database.

The new instance is now fixed of all the problems caused by killing the
postmaster.

I'm not an expert with Postgres but do use it a lot and after reading
your problem and before looking at your follow up I would have suggested
backing up your data with a pg_dump and re-installing the DB cluster.
Interesting how simple and yet effective Postgres is! I love this
database!!!

Schmidty

Apr 2 '06 #3

Similar topics

postmaster(s) have high load average

by: Chris Webster | last post by:

I have one process which writes a single float into 300 columns once per second. I then run 4 process, from remote computers, to query a small subset of the latest row. I have even commented...

PostgreSQL Database

postmaster processes

by: Andy Harrison | last post by:

-----BEGIN PGP SIGNED MESSAGE----- Could anyone explain about the max_connections in postgres? We ran into the connection limit on one of our production servers. One reason is that we have...

PostgreSQL Database

problem running postmaster

by: pw | last post by:

Hello, I am having problems running postmaster. I get the following error: DEBUG: invoking IpcMemoryCreate(size=1466368) PANIC: invalid checksum in control file The postmaster then...

PostgreSQL Database

problem starting postmaster as a service

by: Melanie Bergeron | last post by:

Hi all! I have a big problem installing postmaster as a service under Windows XP with cygwin. I already installed postgresql on 2 machines without problems but this machine really seem to don't...

PostgreSQL Database

postmaster growing to consume all memory

by: Lincoln Yeoh | last post by:

Hi, I recently upgraded to postgresql 7.4 and I am having a problem with postmaster using lots of memory for a query (keeps growing even up to 400MB+ till I stop postgresql ). I don't recall...

PostgreSQL Database

postmaster shutdown failed

by: Durai raj | last post by:

Hello All, After starting the postmaster, I just entered in database like: $ psql test Welcome to psql 7.4.2, the PostgreSQL interactive terminal. Type: \copyright for distribution terms

PostgreSQL Database

How to tell when postmaster is ready

by: Thomas Hallgren | last post by:

I'm writing a small test harness. I have two threads. One that starts the postmaster and another that does all the testing and finally stops the postmaster with a pg_ctl stop. At present, the...

PostgreSQL Database

About "lock file postmaster.pid" Error , cannot setup postmaster

by: Ying Lu | last post by:

Hello all, I met a problem about cannot start postmaster. The situation I met is that I was running postmaster, but because of the server closed all my konsole. As a result, I did not...

PostgreSQL Database

start postmaster as root : problem

by: stefari | last post by:

Hi all, I have to develop an embedded database with PostgreSQL. I have a Linux Distribuction that works only in single user mode as root. The postmaster daemon starts only if I 'm not root...

PostgreSQL Database

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing