Mark A wrote:[color=blue]
> "Steve Pearson (news only)" <stevep222@my-deja.com> wrote in message
> news:1138068189.696082.167740@g44g2000cwa.googlegr oups.com...[color=green]
> > The failed primary will retain the role of primary if it is simply
> > restarted (i.e., via application connection *attempt*, activate db, or
> > restart db command).
> >
> > However, importantly, it should *not* allow an application connection
> > to succeed unless the standby is there and successfully re-pairs with
> > it. Rather, since the original standby took over and is no longer a
> > standby, the activation or connection to the original primary database
> > should be delayed for HADR_TIMEOUT (or 30 seconds of that's longer),
> > then fail with error SQL1768N reason 7 ("The primary database failed to
> > establish a connection to its standby database within the HADR timeout
> > interval").
> >
> > If you observe otherwise, please report it to IBM as a defect.
> >
> > Now, if you *force* the restarting original primary to start in primary
> > role (it should require the START HADR .. AS STANDBY BY FORCE command
> > to do so), then it will oblige. Starting "by force" tells HADR you
> > want to forget about the requirement for the standby to be there, since
> > you know there's good reason for it to be gone (maybe both primary and
> > standby failed concurrently, and the original primary is the first to
> > be restarted). If you happen to do this while the original standby has
> > meanwhile taken over as primary, guess what...self inflicted split
> > brain results.
> >
> > Regarding "STOP HADR", yes, that command will make HADR go away.
> > Whether or not you can follow it by a successful attempt to restart
> > HADR it depends on whether the database is in a valid initialization
> > state (because HADR would be starting over from scratch). For example,
> > the standby should be in rollforward mode and with a database and log
> > stream that matches well with that of the primary. It is possible that
> > if you do nothing but stop hadr followed by start hadr, it might just
> > work. However, issuing stop hadr is not advisable if you really wanted
> > the current instantiation of the db to play HADR again later w/o
> > starting over from scratch. If you want to temporarily stop log
> > shipping, a better approach is to issue the "deactivate db" command at
> > the standby.
> >
> > Regards,
> > - Steve P.
> > ------------------------------------
> > Steve Pearson
> > IBM DB2 UDB for LUW Development
> > Portland, OR, USA
> >[/color]
>
> Steve, I appreciate your comments, but lets get back to the question I
> raised. For the purposes of this discussion, please assume that I am fairly
> knowledgeable about HADR, having worked with it for several months now, so
> lets dispense the fundamentals.
>
> If the original primary server crashes (assume a hardware failure of some
> kind), we will do an HADR takeover by force (force is necessary because the
> original primary is unreachable) and original standby is now the primary.
> Obviously databases are no longer in peer state if the original primary
> server crashes because of hardware failure. Once the takeover has occurred,
> DB2 automatic client reroute (or whatever mechanism one chooses) will point
> the applications to the new primary server (which was previously the standby
> database). Processing of the application continues normally.
>
> Now, at some subsequent point, we will fix the hardware problem with the
> original primary and attempt to bring it online as the standby. After it is
> brought online as the standby, it will catch up with the logs, and only then
> we can do a HADR takeover (without force) to make it the primary again. I
> don't think the timeout is relevant since I am assuming that original
> primary will be down for several hours before it can be repaired.
>
> However, the problem is how do I bring the original primary server back up
> after hardware repair as the standby. In its last state before the server
> crashed, it thought it was the primary, and since a HADR takeover has now
> occurred and the original standby is now the primary, then I will have 2
> primary databases (split brain) when the original standby is repaired and
> brought back up. Any new connections might go to the original primary before
> I have a chance to make it the standby by issuing the command:
> db2 "start hadr on database <original_primary_database> as STANDBY"
>
> So how do I prevent a split brain (even for a short period) when my primary
> server crashes and I bring it back online, and before I can designate it as
> the standby (I already have a primary running). This seems like a
> fundamental issue that must be solved for HADR to provide a continuous
> availability solution.
>
> One of the things that I think DB2 should do, is that any database where
> HADR is configured should attempt establish peer state before any
> connections are allowed, and if the other database is already in primary
> role, and it was activated first, the last database to be activated should
> either automatically start as standby, or should not allow connections until
> some affirmative action is taken by the DBA (allowing the DBA to designate
> it as standby before any connections are allowed).
>
> In the absence of DB2 providing the above capability, perhaps there are some
> procedural things that can be done to not allow connections when the server
> is brought back up, allowing the DBA to make it standby. But I don't see how
> this can be done via SQL statements (such as revoke connection authority)
> since the revoke can only be issued on a database that is primary and
> available for new connections.[/color]
Oneway of doing it requires a start up script for db2. First we make
sure that DB2
can not auto start on any of the HADR server pairs. Then as part of
each start up
script both databases are place in a standby role... Then one of the
servers is
changed to be the primary. In a the case of a hardware crash, we don't
have to
worry about a slipt brain once the hardware problem is fix. However,
now a dba
must be involve on system reboot, which in some shops is on a schedule.
We are
still working a script to automatically start the databases in the
correct mode.
One thing way we are looking into is reading the db configuration files
from both
servers and then determining which database to start as the primary....
doug
www.db2helpdesk.com