473,385 Members | 1,615 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Postmaster Problem

I am normally the admin for our PostgreSQL servers, but someone else tried
to kill a runaway query while I was out sick. Unfortunately, he tried
killing the runaway query by killing the postmaster. Now we are dealing
with a persistent problem that just won't go away.

1) Spontaneous back end deaths.
2) Spontaneous backend disconnects (which are probably because of 1).
3) pg_clog files disappearing.

The first symptom of the problem was a message I received while within psql.
It was something like this: "Some backend process died abnormally, causing
your session to terminate." I could reconnect to the database afterwards.

The second symptom was another use receiving an error message along the
lines of:

"ERROR: could not access status of transaction 97201274
DETAIL: could not open file "/var/lib/pgsql/data/pg_clog/005C": No such
file or directory".

The query was a simple count(*) operation: "select count(*) from
[sometable]".

I read prior Usenet postings for this error and did the following:

1) touch /var/lib/pgsql/data/pg_clog/005C.temp
2) chown postgres.postgres /var/lib/pgsql/data/pg_clog/005C.temp
3) chmod 0600 /var/lib/pgsql/data/pg_clog/005C.temp
4) dd if=/dev/zero of=/var/lib/pgsql/data/pg_clog/005C.temp bs=8192 count=1
5) mv /var/lib/pgsql/data/pg_clog/005C.temp /var/lib/pgsql/data/pg_clog/005C

Then I reloaded PostgreSQL (the prior postings didn't indicated a need to
reload or restart the database, so I chose the least disruptive option):

/etc/init.d/postgresql reload

I tried the query again, and I got an error message saying that PostgreSQL
couldn't read the new 005C file past a certain offset. So I reran the dd
command above, but used a count of 100 instead of 1. The count query then
worked. I crossed my fingers and hoped the problem was solved.

However, the same problem reappeared the next morning. My suspicion was
that PostgreSQL's metadata tables weren't properly updated since I
performed the above operations on a running database. I then brought
PostgreSQL down completely (/etc/init.d/postgresql stop), repeated the
procedure, then restarted the database. Again, the problem seemed to be
fixed.

That was until a user called and reported an error message about a
spontaneous backend disconnection (I viewed the message myself, and it
indeed said that data could not be received from the server). I'm not
certain that was caused by PostgreSQL, because the developer of that
particular program has had long standing problems caused by his own bugs.

I came into work this morning, and tried the same select count(*) query that
I mentioned above, and I got the same file not found error I posted above.
Following the same procedures I outlined above fixed the problem, but now I
know the fix is only temporary.

My questions are:

1) What kind of database damage can I expect to find?

2) I think my daily pg_dumps are succeeding, so is my only real option to
start a new database and restore from a backup?

3) Why is the clog I create disappearing? Is it because of vacuuming?

4) Can this problem be fixed without resorting to creating a new database
cluster? This cluster hosts about a dozen active databases spread across 4
departments.

Mar 2 '06 #1
2 5307
Tony O'Bryan wrote:
I am normally the admin for our PostgreSQL servers, but someone else tried
to kill a runaway query while I was out sick. Unfortunately, he tried
killing the runaway query by killing the postmaster. Now we are dealing
with a persistent problem that just won't go away.


The only quick solution appears to be copying the database cluster.

1) Create a new cluster at a different location using initdb
2) Disable remote access to the old cluster.
3) Dump the cluster.
4) Start a parallel instance of the postmaster at the new location.
5) Import the dump into the new instance.
6) Shut down both instances.
7) Rename the old instance to an archival area.
8) Rename the new instance to that of the old instance.
9) Restart the database.

The new instance is now fixed of all the problems caused by killing the
postmaster.

Mar 3 '06 #2
Tony O'Bryan wrote:
Tony O'Bryan wrote:

I am normally the admin for our PostgreSQL servers, but someone else tried
to kill a runaway query while I was out sick. Unfortunately, he tried
killing the runaway query by killing the postmaster. Now we are dealing
with a persistent problem that just won't go away.

The only quick solution appears to be copying the database cluster.

1) Create a new cluster at a different location using initdb
2) Disable remote access to the old cluster.
3) Dump the cluster.
4) Start a parallel instance of the postmaster at the new location.
5) Import the dump into the new instance.
6) Shut down both instances.
7) Rename the old instance to an archival area.
8) Rename the new instance to that of the old instance.
9) Restart the database.

The new instance is now fixed of all the problems caused by killing the
postmaster.

I'm not an expert with Postgres but do use it a lot and after reading
your problem and before looking at your follow up I would have suggested
backing up your data with a pg_dump and re-installing the DB cluster.
Interesting how simple and yet effective Postgres is! I love this
database!!!

Schmidty
Apr 2 '06 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Chris Webster | last post by:
I have one process which writes a single float into 300 columns once per second. I then run 4 process, from remote computers, to query a small subset of the latest row. I have even commented...
2
by: Andy Harrison | last post by:
-----BEGIN PGP SIGNED MESSAGE----- Could anyone explain about the max_connections in postgres? We ran into the connection limit on one of our production servers. One reason is that we have...
2
by: pw | last post by:
Hello, I am having problems running postmaster. I get the following error: DEBUG: invoking IpcMemoryCreate(size=1466368) PANIC: invalid checksum in control file The postmaster then...
0
by: Melanie Bergeron | last post by:
Hi all! I have a big problem installing postmaster as a service under Windows XP with cygwin. I already installed postgresql on 2 machines without problems but this machine really seem to don't...
8
by: Lincoln Yeoh | last post by:
Hi, I recently upgraded to postgresql 7.4 and I am having a problem with postmaster using lots of memory for a query (keeps growing even up to 400MB+ till I stop postgresql ). I don't recall...
1
by: Durai raj | last post by:
Hello All, After starting the postmaster, I just entered in database like: $ psql test Welcome to psql 7.4.2, the PostgreSQL interactive terminal. Type: \copyright for distribution terms
8
by: Thomas Hallgren | last post by:
I'm writing a small test harness. I have two threads. One that starts the postmaster and another that does all the testing and finally stops the postmaster with a pg_ctl stop. At present, the...
2
by: Ying Lu | last post by:
Hello all, I met a problem about cannot start postmaster. The situation I met is that I was running postmaster, but because of the server closed all my konsole. As a result, I did not...
2
by: stefari | last post by:
Hi all, I have to develop an embedded database with PostgreSQL. I have a Linux Distribuction that works only in single user mode as root. The postmaster daemon starts only if I 'm not root...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.