By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,852 Members | 2,182 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,852 IT Pros & Developers. It's quick & easy.

server auto-restarts and ipcs

P: n/a

A power failure led to failed postmaster restart using 7.4.6 (see output
below). The short-term fix is usually to delete the pid file and restart.

I often wonder why ipcs never seems to show the shared memory
block in question? Am I using the wrong command? Does the key
mentioned by pgsql map to the key in the ipcs output? And if the
shared segment is simply not there, would it be possible for pgsql to
figure that out ala Apache, search the process table, and go ahead
and restart if it didn't see a postmaster already running? I'm sure this
has been asked and answered, I just couldn't find it via google...

TIA.

Ed

Database and process is pg746dba...

$ cat logs-pg746-7.4.6/server_log.Mon
pg_ctl: Another postmaster may be running. Trying to start postmaster anyway.
2004-11-08 17:17:22.398 [18038] FATAL: pre-existing shared memory block (key 9746001, ID 658210829) is still in use
HINT: If you're sure there are no old server processes still running, remove the shared memory block with the command "ipcrm", or just delete the file "/users/pg746dba/dbclusters/pg746/postgresql-7.4.6/data/postmaster.pid".
pg_ctl: cannot start postmaster
Examine the log output.

$ ipcs

------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x00000000 32768 ed 777 393216 2 dest
0x00000000 131073 root 644 110592 4 dest
0x00000000 3538946 ed 777 393216 2 dest
0x00000000 3670019 ed 777 393216 2 dest
0x00000000 4685828 ed 777 393216 2 dest
0x00000000 4816901 ed 777 393216 2 dest
0x00000000 4915206 ed 777 393216 2 dest
0x00000000 4980743 ed 777 393216 2 dest
0x00000000 5046280 ed 777 393216 2 dest
0x00000000 5111817 ed 777 393216 2 dest
0x00000000 5537802 root 644 110592 3 dest
0x00000000 6651915 ed 777 393216 2 dest
0x00000000 19595276 ed 666 14400 1 dest
0x00000000 11272205 root 644 110592 2 dest

------ Semaphore Arrays --------
key semid owner perms nsems

------ Message Queues --------
key msqid owner perms used-bytes messages

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 23 '05 #1
Share this Question
Share on Google+
14 Replies


P: n/a
"Ed L." <pg***@bluepolka.net> writes:
A power failure led to failed postmaster restart using 7.4.6 (see output
below). The short-term fix is usually to delete the pid file and restart. I often wonder why ipcs never seems to show the shared memory
block in question?


The shared memory block would certainly not still exist after a system
reboot, so what we have here is a misleading error message. Looking at
the code, the most plausible explanation appears to be that
shmctl(IPC_STAT) is failing (which it ought to) and returning some errno
code different from EINVAL (which is the case we are expecting to see).
What platform are you on, and what does its shmctl(2) man page document
as error conditions?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 23 '05 #2

P: n/a
On Monday November 8 2004 6:16, Tom Lane wrote:
"Ed L." <pg***@bluepolka.net> writes:
A power failure led to failed postmaster restart using 7.4.6 (see
output below). The short-term fix is usually to delete the pid file
and restart.

I often wonder why ipcs never seems to show the shared memory
block in question?


The shared memory block would certainly not still exist after a system
reboot, so what we have here is a misleading error message. Looking at
the code, the most plausible explanation appears to be that
shmctl(IPC_STAT) is failing (which it ought to) and returning some errno
code different from EINVAL (which is the case we are expecting to see).
What platform are you on, and what does its shmctl(2) man page document
as error conditions?


Platform is Linux 2.4.20-30.9 on i686 (Pentium 4, I think).

From man 2 schctl:

ERRORS
On error, errno will be set to one of the following:

EACCES is returned if IPC_STAT is requested and
shm_perm.modes does not allow read access for shmid.

EFAULT The argument cmd has value IPC_SET or IPC_STAT but
the address pointed to by buf isn’t accessible.

EINVAL is returned if shmid is not a valid identifier, or cmd
is not a valid command.

EIDRM is returned if shmid points to a removed identifier.

EPERM is returned if IPC_SET or IPC_RMID is attempted, and
the effective user ID of the calling process is not the creator (as found
in shm_perm.cuid), the owner (as found in shm_perm.uid), or the
super-user.

EOVERFLOW is returned if IPC_STAT is attempted, and the gid or
uid value is too large to be stored in the structure pointed to by buf.
CONFORMING TO
SVr4, SVID. SVr4 documents additional error conditions EINVAL,
ENOENT, ENOSPC, ENOMEM, EEXIST. Neither SVr4 nor SVID documents an EIDRM
error condition.
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 23 '05 #3

P: n/a
On Monday November 8 2004 7:24, Ed L. wrote:
On Monday November 8 2004 6:16, Tom Lane wrote:
"Ed L." <pg***@bluepolka.net> writes:
A power failure led to failed postmaster restart using 7.4.6 (see
output below). The short-term fix is usually to delete the pid file
and restart.

I often wonder why ipcs never seems to show the shared memory
block in question?


The shared memory block would certainly not still exist after a system
reboot, so what we have here is a misleading error message. Looking at
the code, the most plausible explanation appears to be that
shmctl(IPC_STAT) is failing (which it ought to) and returning some
errno code different from EINVAL (which is the case we are expecting to
see). What platform are you on, and what does its shmctl(2) man page
document as error conditions?


Platform is Linux 2.4.20-30.9 on i686 (Pentium 4, I think).


I recently saw this same thing happen from a power failure on several HPUX
boxes as well (I think running B.11.00/11.23 with 7.3.4/7.3.7, but not
sure).

Ed
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 23 '05 #4

P: n/a
"Ed L." <pg***@bluepolka.net> writes:
A power failure led to failed postmaster restart using 7.4.6 (see
output below). The short-term fix is usually to delete the pid file
and restart.


Thinking some more about this ... does anyone know the algorithm used
in Linux to assign shared memory segment IDs?

Your report shows about a dozen shmem segments in use; which would put
the probability of an accidental collision at pretty-tiny. But if the
kernel's assignment algorithm is nonrandom then it'd be plausible for
the Postgres shmem ID from the previous system boot cycle to match
one of the shmem IDs already handed out in the current boot cycle.
In that case we'd get EACCES from shmctl() which we take to be a trouble
indication. (This is probably over-conservatism, but I don't want to
relax it without knowing for sure that we need to.)

BTW, do you know what all those shmem segments are for? My Linux box
shows only one segment in use besides the ones Postgres is using.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 23 '05 #5

P: n/a
On Monday November 8 2004 8:41, Tom Lane wrote:

BTW, do you know what all those shmem segments are for? My Linux box
shows only one segment in use besides the ones Postgres is using.


Looks like Ximian Evolution apps, X, Mozilla, Wombat, etc ...

Ed
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #6

P: n/a
On Mon, 2004-11-08 at 17:47 -0700, Ed L. wrote:
I often wonder why ipcs never seems to show the shared memory
block in question?


The permissions of the shared memory block and the semaphore arrays are
600. ipcs seems not to report objects which you cannot access. Run
ipcs as root and you should see the PostgreQSL shared memory segment and
semaphores.

--
Oliver Elphick ol**@lfix.co.uk
Isle of Wight http://www.lfix.co.uk/oliver
GPG: 1024D/A54310EA 92C8 39E7 280E 3631 3F0E 1EC0 5664 7A2F A543 10EA
========================================
"O death, where is thy sting? O grave, where is
thy victory?" 1 Corinthians 15:55
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #7

P: n/a
On Tuesday November 9 2004 2:16, Oliver Elphick wrote:
On Mon, 2004-11-08 at 17:47 -0700, Ed L. wrote:
I often wonder why ipcs never seems to show the shared memory
block in question?


The permissions of the shared memory block and the semaphore arrays are
600. ipcs seems not to report objects which you cannot access. Run
ipcs as root and you should see the PostgreQSL shared memory segment and
semaphores.


I don't see them when running ipcs as root, either. Not sure that would
make sense given the shared memory is created as the same user running
ipcs...

Ed

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #8

P: n/a
On Tue, 2004-11-09 at 07:00 -0700, Ed L. wrote:
On Tuesday November 9 2004 2:16, Oliver Elphick wrote:
On Mon, 2004-11-08 at 17:47 -0700, Ed L. wrote:
I often wonder why ipcs never seems to show the shared memory
block in question?


The permissions of the shared memory block and the semaphore arrays are
600. ipcs seems not to report objects which you cannot access. Run
ipcs as root and you should see the PostgreQSL shared memory segment and
semaphores.


I don't see them when running ipcs as root, either. Not sure that would
make sense given the shared memory is created as the same user running
ipcs...


If neither root nor their creator can see them, I assume they don't
exist. Certainly, with Linux 2.6 and util-linux 2.12, ipcs sees the
postgres objects whether it is run by root or by the postgres user.

--
Oliver Elphick ol**@lfix.co.uk
Isle of Wight http://www.lfix.co.uk/oliver
GPG: 1024D/A54310EA 92C8 39E7 280E 3631 3F0E 1EC0 5664 7A2F A543 10EA
========================================
"O death, where is thy sting? O grave, where is
thy victory?" 1 Corinthians 15:55
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #9

P: n/a

Tom Lane <tg*@sss.pgh.pa.us> writes:
"Ed L." <pg***@bluepolka.net> writes:
A power failure led to failed postmaster restart using 7.4.6 (see
output below). The short-term fix is usually to delete the pid file
and restart.


Thinking some more about this ... does anyone know the algorithm used
in Linux to assign shared memory segment IDs?


At least in 2.6 it seems to avoid reuse of ids by keeping a global counter
that is incremented every time a segment is created which ranges from 0..128k
that it multiplies by 32k and adds to the array index (which is reused
quickly).

So it doesn't seem plausible that there was an id collision unless this was
different in 2.4.20. However looking at his list of ids they're all separated
by multiples of 32769 which is what you would expect from this algorithm at
least until they start being reused.

--
greg
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 23 '05 #10

P: n/a

Greg Stark <gs*****@MIT.EDU> writes:
At least in 2.6 it seems to avoid reuse of ids by keeping a global counter
that is incremented every time a segment is created which ranges from 0..128k
that it multiplies by 32k and adds to the array index (which is reused
quickly).

So it doesn't seem plausible that there was an id collision unless this was
different in 2.4.20. However looking at his list of ids they're all separated
by multiples of 32769 which is what you would expect from this algorithm at
least until they start being reused.


Oh I missed the fact that you were talking about after a reboot. So the
algorithm I described would produce exactly the same sequence of ids after any
reboot given the same sequence of creation and deletions. Even if there's a
different sequence as long as the n'th creation is for the m'th array slot it
would get the same id. So collisions would be very common.

(though it seems the sequence is shared across all the ipc objects.)

--
greg
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #11

P: n/a
Greg Stark <gs*****@MIT.EDU> writes:
Oh I missed the fact that you were talking about after a reboot. So the
algorithm I described would produce exactly the same sequence of ids after any
reboot given the same sequence of creation and deletions. Even if there's a
different sequence as long as the n'th creation is for the m'th array slot it
would get the same id. So collisions would be very common.


This seems to square with Ed's complaint that he frequently sees a
collision after a reboot. I've just committed some code that makes a
more extensive check as to whether a pre-existing segment actually has
any relevance to our data directory; should fix the problem.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 23 '05 #12

P: n/a
On Tuesday November 9 2004 1:37, Tom Lane wrote:
The shared memory block would certainly not still exist after a system
reboot, so what we have here is a misleading error message. Looking
at the code, the most plausible explanation appears to be that
shmctl(IPC_STAT) is failing (which it ought to) and returning some
errno code different from EINVAL (which is the case we are expecting
to see).


I believe the attached patch will fix this problem for you, at least on
the assumption that you are starting only one postmaster at system boot.


Just realizing we do start multiple postmasters under same user id when
upgrading a cluster (one on old port, one on new).

I noticed that ipcs on my linux box has a command-line option to list the
pid that created the segment. Not sure if such a library exists in usable
form, but looking for segments owned by the downed postmaster's pid would
seem to be what is needed. Just a thought...

Ed
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #13

P: n/a
"Ed L." <pg***@bluepolka.net> writes:
I noticed that ipcs on my linux box has a command-line option to list the
pid that created the segment. Not sure if such a library exists in usable
form, but looking for segments owned by the downed postmaster's pid would
seem to be what is needed. Just a thought...


[ thinks about it... ] Nah, it's still not bulletproof, because in a
system reboot situation you can't trust the old PID either. It could
easy be that the other guy gets both the PID and the shmem ID that
belonged to you last time.

I've committed changes for 8.0 that mark a shmem segment with the inode
of the associated data directory; that should be a stable enough ID to
handle all routine-reboot cases. (If you had to restore your whole
filesystem from backup tapes, it might be wrong, but you're going to be
doing such recovery manually anyway ...)

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 23 '05 #14

P: n/a
On Tuesday November 9 2004 4:35, Tom Lane wrote:
"Ed L." <pg***@bluepolka.net> writes:
I noticed that ipcs on my linux box has a command-line option to list
the pid that created the segment. Not sure if such a library exists in
usable form, but looking for segments owned by the downed postmaster's
pid would seem to be what is needed. Just a thought...
[ thinks about it... ] Nah, it's still not bulletproof, because in a
system reboot situation you can't trust the old PID either. It could
easy be that the other guy gets both the PID and the shmem ID that
belonged to you last time.


I see. Ipcs on my box also lists the date/time of shared memory segment
attach/detach/change (ipcs -t), but ...
I've committed changes for 8.0 that mark a shmem segment with the inode
of the associated data directory; that should be a stable enough ID to
handle all routine-reboot cases. (If you had to restore your whole
filesystem from backup tapes, it might be wrong, but you're going to be
doing such recovery manually anyway ...)


....that will remove a major hassle for us and lots of other. Thanks.

Ed

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #15

This discussion thread is closed

Replies have been disabled for this discussion.