Hi all,
if I accidentally use a TAKEOVER command with BY FORCE clause while
primary and standby are in peer state I'll end up with two primary's
(at least with FP10 and Windows). Is this works as designed or a bug ?
Manuals say that the standby will inform the primary about the takeover
but will not wait for acknowledgement, so the primary knows about whats
going on. In my eyes primary should either switch to standby or
shutdown immediately in this situation - what do you think ?
TIA
Joachim 3 5415
Hi, Joachim.
If HADR is functioning correctly, a TAKEOVER .. BY FORCE while the two
sites are in Peer state will result in (a) a new viable primary, and
(b) a zombie old primary. The poison pill that the standby sends to
the primary in this case does not itself shut the old primary down.
What it does do is hobble the primary such that it can no longer
generate any new log, and the next time that an agent attempts to do
so, it will bring down the server.
Note that the poison pill is a secondary mechanism intended to help
prevent split brain. A fundamental premise is that the primary is dead
when a TAKEOVER .. BY FORCE is issued. The poison pill is a backstop
in case of either (a) incorrect operation of the subsystem, or (b)
primary is wedged such that it is alive but not functioning well
(perhaps user can't even cause the primary db to shut down without
impacting a larger-grain entity such as the entire instance or the host
machine).
In this scenario, switching the old primary's role to standby or
immediate shutdown of the old primary sounds nice, but the devil is in
the details. If the user wanted to switch roles, then the non-forced
takeover should have been issued. Since a forced takeover was issued,
we assume the primary is dead or at least in a world of hurt, and the
user has requested a failover. As such we're not at liberty to wait
for a clean shutdown or role transition on the old primary. (Note as
well that we can't be sure such action would be successful were we to
attempt it.)
Anyway, you may well see the old primary still reporting its role as
primary after such an event, and it can even perform non-logged
operations, indefinitely. While this is not ideal, it should not be
mistaken for the system being split brained at that time (well, at
least the brain on one side is a read-only brain :-). Feel free to
shut down the old primary at your convenience.
We do have on our list of potential future enhancements an item to try
and shut down the primary more cleanly in this situation. This would
be at best an "attempt" and it would follow after the existing
mechanism. We currently only need to set a flag in memory in response
to the poison pill. To do more involves actions that may or may not
succeed if the old primary is wedged up somehow. (We don't just panic
the instance because it's not a good citizen kind a thing for a piece
of HA software to increase the scope of a failure.) It is important to
note that this potential enhancement is *not* a high priority for us,
as there are a number of higher-value potential HADR enhancements.
It's hard to make a business case to change the way the old primary
goes away from ugly to maybe more graceful in a scenario which is rare
or involves incorrect operation of the system, and where the change
does not really enhance the availability nor the consistency of the
system.
Finally, note that while the poison pill mechanism is intended to
prevent active/continuing split brain, it does not guarantee that
inconsistency is avoided during the event. The standby (new primary)
waits only very briefly for any last log traffic from the old primary.
We assume that the primary is dead (or ought to be) and that for
availability is it important not to delay failover in case some last
bit of log is struggling to flow across. It is possible that some does
not make it across before the standby takes over in this case. The
consequence is that it may not be possible to later reintegrate the old
primary as the new standby later due to the divergence, and instead a
reinitialization will be required (back up new primary and restore it
to old primary/new standby).
Regards,
- Steve P.
--
Steve Pearson, IBM DB2 for Linux, UNIX, and Windows, IBM Software Group
DB2 "Portland" Development Team, IBM Beaverton Lab, Beaverton, OR, USA
Steve Pearson (news only) schrieb:
Hi, Joachim.
If HADR is functioning correctly, a TAKEOVER .. BY FORCE while the two
sites are in Peer state will result in (a) a new viable primary, and
(b) a zombie old primary. The poison pill that the standby sends to
the primary in this case does not itself shut the old primary down.
What it does do is hobble the primary such that it can no longer
generate any new log, and the next time that an agent attempts to do
so, it will bring down the server.
Note that the poison pill is a secondary mechanism intended to help
prevent split brain. A fundamental premise is that the primary is dead
when a TAKEOVER .. BY FORCE is issued. The poison pill is a backstop
in case of either (a) incorrect operation of the subsystem, or (b)
primary is wedged such that it is alive but not functioning well
(perhaps user can't even cause the primary db to shut down without
impacting a larger-grain entity such as the entire instance or the host
machine).
In this scenario, switching the old primary's role to standby or
immediate shutdown of the old primary sounds nice, but the devil is in
the details. If the user wanted to switch roles, then the non-forced
takeover should have been issued. Since a forced takeover was issued,
we assume the primary is dead or at least in a world of hurt, and the
user has requested a failover. As such we're not at liberty to wait
for a clean shutdown or role transition on the old primary. (Note as
well that we can't be sure such action would be successful were we to
attempt it.)
Anyway, you may well see the old primary still reporting its role as
primary after such an event, and it can even perform non-logged
operations, indefinitely. While this is not ideal, it should not be
mistaken for the system being split brained at that time (well, at
least the brain on one side is a read-only brain :-). Feel free to
shut down the old primary at your convenience.
We do have on our list of potential future enhancements an item to try
and shut down the primary more cleanly in this situation. This would
be at best an "attempt" and it would follow after the existing
mechanism. We currently only need to set a flag in memory in response
to the poison pill. To do more involves actions that may or may not
succeed if the old primary is wedged up somehow. (We don't just panic
the instance because it's not a good citizen kind a thing for a piece
of HA software to increase the scope of a failure.) It is important to
note that this potential enhancement is *not* a high priority for us,
as there are a number of higher-value potential HADR enhancements.
It's hard to make a business case to change the way the old primary
goes away from ugly to maybe more graceful in a scenario which is rare
or involves incorrect operation of the system, and where the change
does not really enhance the availability nor the consistency of the
system.
Finally, note that while the poison pill mechanism is intended to
prevent active/continuing split brain, it does not guarantee that
inconsistency is avoided during the event. The standby (new primary)
waits only very briefly for any last log traffic from the old primary.
We assume that the primary is dead (or ought to be) and that for
availability is it important not to delay failover in case some last
bit of log is struggling to flow across. It is possible that some does
not make it across before the standby takes over in this case. The
consequence is that it may not be possible to later reintegrate the old
primary as the new standby later due to the divergence, and instead a
reinitialization will be required (back up new primary and restore it
to old primary/new standby).
Regards,
- Steve P.
--
Steve Pearson, IBM DB2 for Linux, UNIX, and Windows, IBM Software Group
DB2 "Portland" Development Team, IBM Beaverton Lab, Beaverton, OR, USA
Steve,
thanks a lot for that very detailed explanation (again :-) ).
there are a number of higher-value potential HADR enhancements
I lately was on a DB2 Viper workshop where I was told that there will
be no HADR major enhancements in DB2 V9 GA. Can you comment on this ?
thanks again
Joachim
I lately was on a DB2 Viper workshop where I was told that there will
be no HADR major enhancements in DB2 V9 GA. Can you comment on this ?
There are no HADR-specific features in DB2 9. However, HADR does
support new DB2 9 features such as XML data, compression,
range-partitioned tables, and IPv6.
Regards,
- Steve P.
--
Steve Pearson, IBM DB2 for Linux, UNIX, and Windows, IBM Software Group
DB2 "Portland" Development Team, IBM Beaverton Lab, Beaverton, OR, USA This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: bwmiller16 |
last post by:
Folks -
OpSys: RH Linux, AS3
DB: UDB 8.2.0 (FP7 Stinger)
HW: 2 X-series
We setup a HADR pair and they went into PEER state (after we...
|
by: Mark A |
last post by:
If server 01 running HADR in the primary role crashes, and the DBA does a
HADR takeover by force on the 02 server to switch roles, then the 02...
|
by: Mark A |
last post by:
A consultant has recommended to us that we use virtual IP addresses for our
HADR databases (the virtual IP address is moved when the primary...
|
by: Mark A |
last post by:
DB2 ESE 8.2.3 (FP10) for Linux
We are experiencing a connection hang of 10 - 15 minutes in the following
HADR and automatic client reroute...
|
by: Challenge |
last post by:
Hi,
I got error, SQL1768N Unable to start HADR. Reason code = "7", when I
tried to start hadr primary database. Here are the hadr configuration...
|
by: shorti |
last post by:
I have two questions about HADR recovery. I am running db2 v8 fp12.
1) If the primary suddenly crashes would you always want to switch the...
|
by: Laurence |
last post by:
Hi folks,
Anyone knows what do these mean?
Primary log position(file, page, LSN) = S0000009.LOG, 0,
00000000036B0000
Standby log...
|
by: agentlease |
last post by:
Hi,
Testing the above without TSA or HA, just plain HADR performing manual
db2 TAKEOVER HADR ......................... etc.
I am testing...
|
by: agentlease |
last post by:
Hi,
If the HADR state is 'Disconnected' and commit transactions to the
Primary database, in the event of a Failover to the Standby database,
how...
|
by: Kemmylinns12 |
last post by:
Blockchain technology has emerged as a transformative force in the business world, offering unprecedented opportunities for innovation and...
|
by: jalbright99669 |
last post by:
Am having a bit of a time with URL Rewrite. I need to incorporate http to https redirect with a reverse proxy. I have the URL Rewrite rules made...
|
by: Matthew3360 |
last post by:
Hi there. I have been struggling to find out how to use a variable as my location in my header redirect function.
Here is my code.
...
|
by: AndyPSV |
last post by:
HOW CAN I CREATE AN AI with an .executable file that would suck all files in the folder and on my computerHOW CAN I CREATE AN AI with an .executable...
|
by: Matthew3360 |
last post by:
Hi,
I have been trying to connect to a local host using php curl. But I am finding it hard to do this. I am doing the curl get request from my web...
|
by: Oralloy |
last post by:
Hello Folks,
I am trying to hook up a CPU which I designed using SystemC to I/O pins on an FPGA.
My problem (spelled failure) is with the...
|
by: Carina712 |
last post by:
Setting background colors for Excel documents can help to improve the visual appeal of the document and make it easier to read and understand....
|
by: BLUEPANDA |
last post by:
At BluePanda Dev, we're passionate about building high-quality software and sharing our knowledge with the community. That's why we've created a SaaS...
|
by: Rahul1995seven |
last post by:
Introduction:
In the realm of programming languages, Python has emerged as a powerhouse. With its simplicity, versatility, and robustness, Python...
| |