Unable to start HADR reason code 7

gumby

I'm having trouble getting HADR to work with the sample databases on
two HS20 xSeries blades, Red Hat ES4 up3, DB2 8.2.4, getting the
following error.

SQL1768N Unable to start HADR. Reason code = "7" - The primary
database failed to establish a connection to its standby database
within the HADR timeout interval.

What things should I check besides the remote host and remote service
parameters on the standby database, which seem to be correct. Each of
the servers can see each other via pings etc. I have sucessfully setup
HADR on a single server.

thanks
dub

May 31 '06 #1

Subscribe Post Reply

21668

Mark A

"gumby" <da******@gmail.com> wrote in message
news:11**********************@i39g2000cwa.googlegr oups.com...

I'm having trouble getting HADR to work with the sample databases on
two HS20 xSeries blades, Red Hat ES4 up3, DB2 8.2.4, getting the
following error.

SQL1768N Unable to start HADR. Reason code = "7" - The primary
database failed to establish a connection to its standby database
within the HADR timeout interval.

What things should I check besides the remote host and remote service
parameters on the standby database, which seem to be correct. Each of
the servers can see each other via pings etc. I have sucessfully setup
HADR on a single server.

thanks
dub

I assume you have already started HADR on the standby database first, before
you started HADR on the primary. If that is true, then try logging on the
standby database and activating the standby database

db2 activate database sample

Then retry starting HADR on primary.

May 31 '06 #2

gumby

Yes, I think the control center runs the following commands anyway. And
if I activate the standby it says it is already activated. Here are the
final commands run.

-- Start HADR on standby database
--
DEACTIVATE DATABASE SAMPLE
START HADR ON DATABASE SAMPLE AS STANDBY
--
-- Start HADR on primary database
--
DEACTIVATE DATABASE SAMPLE
START HADR ON DATABASE SAMPLE AS PRIMARY

Just to clarify, I have sucessfully setup HADR bewteen 2 different
databases on the same server using the control center gui. My problem
is between databases on two different servers. I have tried the manual
command method and the control center, both with the same results.

Using the control center commands

Standby diag file ends with

2006-05-31-16.24.26.101725-240 E476637G362 LEVEL: Event
PID : 27068 TID : 3086558912 PROC : db2hadrs
(SAMPLE) 0
INSTANCE: sample NODE : 000 DB : SAMPLE
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSetHdrState,
probe:10000
CHANGE : HADR state set to S-RemoteCatchupPending (was S-LocalCatchup)

2006-05-31-16.24.25.999932-240 I477000G398 LEVEL: Warning
PID : 27057 TID : 3086558912 PROC : db2agent
(SAMPLE) 0
INSTANCE: sample NODE : 000 DB : SAMPLE
APPHDL : 0-14 APPID: *LOCAL.sample.060531202426
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEduStartup,
probe:21152
MESSAGE : Info: HADR Startup has completed.

Primary diag files ends with

2006-05-31-16.24.32.714718+600 E128512G336 LEVEL: Event
PID : 9575 TID : 3085870784 PROC : db2hadrp
(SAMPLE) 0
INSTANCE: sample NODE : 000 DB : SAMPLE
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSetHdrState,
probe:10000
CHANGE : HADR state set to P-Boot (was None)

2006-05-31-16.24.32.719416+600 I128849G318 LEVEL: Warning
PID : 9575 TID : 3085870784 PROC : db2hadrp
(SAMPLE) 0
INSTANCE: sample NODE : 000 DB : SAMPLE
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEduP,
probe:20301
MESSAGE : Info: Primary Started.

2006-05-31-16.26.18.769577+600 I129489G321 LEVEL: Event
PID : 5376 TID : 2947414960 PROC : db2hmon
INSTANCE: sample NODE : 000
FUNCTION: DB2 UDB, Automatic Table Maintenance, db2HmonEvalStats,
probe:900
STOP : Automatic Runstats: evaluation has finished on database
SAMPLE

2006-05-31-16.26.33.712145+600 I129811G571 LEVEL: Error
PID : 9575 TID : 3085870784 PROC : db2hadrp
(SAMPLE) 0
INSTANCE: sample NODE : 000 DB : SAMPLE
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEduP,
probe:20390
MESSAGE : HADR primary did not establish connection with standby within
timeout
and will shut down. BY FORCE option required to start primary
without
standby. Timeout seconds =
DATA #1 : Hexdump, 4 bytes
0x12C13A3C : 7800 0000 x...

2006-05-31-16.26.33.712399+600 I130383G418 LEVEL: Error
PID : 9575 TID : 3085870784 PROC : db2hadrp
(SAMPLE) 0
INSTANCE: sample NODE : 000 DB : SAMPLE
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEduP,
probe:20390
RETCODE : ZRC=0x8280001A=-2105540582=HDR_ZRC_NO_STANDBY
"Comm time-out in unforced HADR primary start, to avoid
split-brain"

2006-05-31-16.26.33.712573+600 I130802G319 LEVEL: Warning
PID : 9575 TID : 3085870784 PROC : db2hadrp
(SAMPLE) 0
INSTANCE: sample NODE : 000 DB : SAMPLE
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEduP,
probe:20302
MESSAGE : Info: Primary Finished.

2006-05-31-16.26.33.712704+600 I131122G422 LEVEL: Error
PID : 9575 TID : 3085870784 PROC : db2hadrp
(SAMPLE) 0
INSTANCE: sample NODE : 000 DB : SAMPLE
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEduEntry,
probe:21100
RETCODE : ZRC=0x8280001A=-2105540582=HDR_ZRC_NO_STANDBY
"Comm time-out in unforced HADR primary start, to avoid
split-brain"
Any assistance greatly appreciated

cheers
dub
dub

May 31 '06 #3

Mark A

"gumby" <da******@gmail.com> wrote in message
news:11**********************@u72g2000cwu.googlegr oups.com...

Yes, I think the control center runs the following commands anyway. And
if I activate the standby it says it is already activated. Here are the
final commands run.

-- Start HADR on standby database
--
DEACTIVATE DATABASE SAMPLE
START HADR ON DATABASE SAMPLE AS STANDBY
--
-- Start HADR on primary database
--
DEACTIVATE DATABASE SAMPLE
START HADR ON DATABASE SAMPLE AS PRIMARY

Just to clarify, I have sucessfully setup HADR bewteen 2 different
databases on the same server using the control center gui. My problem
is between databases on two different servers. I have tried the manual
command method and the control center, both with the same results.

Any assistance greatly appreciated

cheers
dub

Can you post your db config parms (HADR section only) on both primary and
standby databases?

Also, post output from "db2level" and the OS you are using.

May 31 '06 #4

Steve Pearson (news only)

From the snippets of diag log shown, it appears that the standby was

not able to establish a socket connection with the primary (primary
listens, standby connects). It seems fairly common that this is not
correctly configured on the first attempt. We've seen issues with
incorrect HADR parameters, DNS problems, failure to properly set up
service names, and inability to correctly map across a NAT.

Double check that your HADR comms parameters mesh up correctly (each
side properly refers to itself in LOCAL params and to the other in
REMOTE params).

HADR_LOCAL_HOST
HADR_LOCAL_SVC
HADR_REMOTE_HOST
HADR_REMOTE_SVC

Ensure that your service names are registered and/or use IP addresses.
Try using fully-specified network naming (a.b.c.d) for host names if
you haven't already.

HTH.

Regards,
- Steve P.
--
Steve Pearson, IBM DB2 UDB for LUW Development, IBM Software Group
DB2 "Portland" Team, IBM Beaverton Lab, Beaverton, OR, USA

May 31 '06 #5

Phil Sherman

Your ability to get this working on a single system indicates that you
have the knowledge to do this from the database perspective.

A common cause of problems when going from one system to two systems,
especially with Linux, is the requirement to pass through the firewalls.
Make sure they are configured to allow the HADR ports to pass traffic.

Phil Sherman

gumby wrote:

I'm having trouble getting HADR to work with the sample databases on
two HS20 xSeries blades, Red Hat ES4 up3, DB2 8.2.4, getting the
following error.

SQL1768N Unable to start HADR. Reason code = "7" - The primary
database failed to establish a connection to its standby database
within the HADR timeout interval.

What things should I check besides the remote host and remote service
parameters on the standby database, which seem to be correct. Each of
the servers can see each other via pings etc. I have sucessfully setup
HADR on a single server.

thanks
dub

May 31 '06 #6

Mark A

"Phil Sherman" <ps******@ameritech.net> wrote in message
news:R8**********@newssvr33.news.prodigy.com...

Your ability to get this working on a single system indicates that you
have the knowledge to do this from the database perspective.

A common cause of problems when going from one system to two systems,
especially with Linux, is the requirement to pass through the firewalls.
Make sure they are configured to allow the HADR ports to pass traffic.

Phil Sherman

He is using the GUI interface. I was able to configure HADR on a local
Windows box with the GUI, but not with remote Linux boxes. Using command
line configuration scripts on remote Linux boxes worked fine.

Jun 1 '06 #7

gumby

I'm Running Red Hat ES4 up3

[sample@tank ~]$ uname -r
2.6.9-34.ELsmp

[sample@tank ~]$ db2level
DB21085I Instance "sample" uses "32" bits and DB2 code release
"SQL08024" with
level identifier "03050106".
Informational tokens are "DB2 v8.1.2.104", "s060120", "MI00152", and
FixPak
"11".
Product is installed at "/opt/IBM/db2/V8.1".

STANDBY - tank
HADR database role = STANDARD
HADR local host name (HADR_LOCAL_HOST) = tank
HADR local service name (HADR_LOCAL_SVC) = DB2_HADR_2
HADR remote host name (HADR_REMOTE_HOST) = dozer
HADR remote service name (HADR_REMOTE_SVC) = DB2_HADR_1
HADR instance name of remote server (HADR_REMOTE_INST) = sample
HADR timeout value (HADR_TIMEOUT) = 120
HADR log write synchronization mode (HADR_SYNCMODE) = NEARSYNC

PRIMARY - dozer
HADR database role = STANDARD
HADR local host name (HADR_LOCAL_HOST) = dozer
HADR local service name (HADR_LOCAL_SVC) = DB2_HADR_1
HADR remote host name (HADR_REMOTE_HOST) = tank
HADR remote service name (HADR_REMOTE_SVC) = DB2_HADR_2
HADR instance name of remote server (HADR_REMOTE_INST) = sample
HADR timeout value (HADR_TIMEOUT) = 120
HADR log write synchronization mode (HADR_SYNCMODE) = NEARSYNC

/etc/services
# Local services
DB2_sample 60000/tcp
DB2_sample_1 60001/tcp
DB2_sample_2 60002/tcp
DB2_sample_END 60003/tcp
DB2_HADR_1 55001/tcp
DB2_HADR_2 55002/tcp

Currently doing some more tests with and without the GUI on the linux
boxes.

Jun 1 '06 #8

gumby

Are there any requirements for the servers to be cataloged. I mean on
the primary, should there be a catalog/node entry (no sure of the
correct terms) to the standby. And likewise should there be an entry on
the standby pointing to the primary.

Should they be described by running the command db2 list node
directory.?

Jun 1 '06 #9

Mark A

"gumby" <da******@gmail.com> wrote in message
news:11*********************@y43g2000cwc.googlegro ups.com...

I'm Running Red Hat ES4 up3

[sample@tank ~]$ uname -r
2.6.9-34.ELsmp

[sample@tank ~]$ db2level
DB21085I Instance "sample" uses "32" bits and DB2 code release
"SQL08024" with
level identifier "03050106".
Informational tokens are "DB2 v8.1.2.104", "s060120", "MI00152", and
FixPak
"11".
Product is installed at "/opt/IBM/db2/V8.1".

STANDBY - tank
HADR database role = STANDARD
HADR local host name (HADR_LOCAL_HOST) = tank
HADR local service name (HADR_LOCAL_SVC) = DB2_HADR_2
HADR remote host name (HADR_REMOTE_HOST) = dozer
HADR remote service name (HADR_REMOTE_SVC) = DB2_HADR_1
HADR instance name of remote server (HADR_REMOTE_INST) = sample
HADR timeout value (HADR_TIMEOUT) = 120
HADR log write synchronization mode (HADR_SYNCMODE) = NEARSYNC

PRIMARY - dozer
HADR database role = STANDARD
HADR local host name (HADR_LOCAL_HOST) = dozer
HADR local service name (HADR_LOCAL_SVC) = DB2_HADR_1
HADR remote host name (HADR_REMOTE_HOST) = tank
HADR remote service name (HADR_REMOTE_SVC) = DB2_HADR_2
HADR instance name of remote server (HADR_REMOTE_INST) = sample
HADR timeout value (HADR_TIMEOUT) = 120
HADR log write synchronization mode (HADR_SYNCMODE) = NEARSYNC

/etc/services
# Local services
DB2_sample 60000/tcp
DB2_sample_1 60001/tcp
DB2_sample_2 60002/tcp
DB2_sample_END 60003/tcp
DB2_HADR_1 55001/tcp
DB2_HADR_2 55002/tcp

Currently doing some more tests with and without the GUI on the linux
boxes.

This may not help, but I would use the port number in your db config, and
not the service name (but leave the service names in the /etc/services).

I assume the database names are the same on primary and standby (not
specified in your post above).

When the HADR database role is STANDARD, then that means that HADR has not
been started. So manually "start HADR on db xxxxxxx as standby" (on tank),
and then (if successful) "start HADR on db xxxxxxx as primary" (on dozer).
You must start HADR on the standby first.

If the above does not work, then you should check the ports (at the OS
level) to make sure no one else is using 55001 and 55002. The recommended
HADR ports start with 18819 (although I have no reason why, and don't know
if this matters).

A useful monitoring tool of the current HADR status without the GUI is to
take a database snapshot (refer to HADR section):
db2 get snapshot for database on xxxxxxxx

As I said previously, I was not able to get the GUI to work for HADR on
Linux, but there are very few commands needed to get it working, so it is
easy to script from the command line.

Jun 1 '06 #10

Mark A

"gumby" <da******@gmail.com> wrote in message
news:11**********************@u72g2000cwu.googlegr oups.com...

Are there any requirements for the servers to be cataloged. I mean on
the primary, should there be a catalog/node entry (no sure of the
correct terms) to the standby. And likewise should there be an entry on
the standby pointing to the primary.

Should they be described by running the command db2 list node
directory.?

No, the nodes or databases on the other sever (standby or primary) do not
need to be catalogued in the local node or db directory.

Jun 1 '06 #11

gumby

These are the commands I have run to try to get HADR going. (Basically
cut paste from what the GUI displays). Do they look okay ? any
suggestions ? I have used your (Mark) suggested ports the others are
the ones that the HADR GUI comes up with.

--
-- Copy backup images from primary to standby system.
--
-- Location on primary system : /home/sample
-- Location on standby system : /home/sample
--
-- Restore database on standby system - TANK - SAMPLET (sample) -
SAMPLET (SAMPLE)
--
RESTORE DATABASE SAMPLE FROM "/home/sample" TAKEN AT 20060602104857
REPLACE HISTORY FILE WITHOUT PROMPTING
--
-- Configure databases for client reroute - DOZER - sample - SAMPLE
--
UPDATE ALTERNATE SERVER FOR DATABASE SAMPLE USING HOSTNAME tank PORT
60000
--
-- Configure databases for client reroute - TANK - SAMPLET (sample) -
SAMPLET (SAMPLE)
--
UPDATE ALTERNATE SERVER FOR DATABASE SAMPLE USING HOSTNAME dozer PORT
60000
--
-- Update service file on primary system - DOZER
-- Service name : DB2_HADR_1
-- Port number : 18819
-- Service name : DB2_HADR_2
-- Port number : 18820
--
-- Update service file on standby system - TANK
-- Service name : DB2_HADR_1
-- Port number : 18819
-- Service name : DB2_HADR_2
-- Port number : 18820
--
-- Update HADR configuration parameters on primary database - DOZER -
sample - SAMPLE
--
UPDATE DB CFG FOR SAMPLE USING HADR_LOCAL_HOST dozer
UPDATE DB CFG FOR SAMPLE USING HADR_LOCAL_SVC DB2_HADR_1
UPDATE DB CFG FOR SAMPLE USING HADR_REMOTE_HOST tank
UPDATE DB CFG FOR SAMPLE USING HADR_REMOTE_SVC DB2_HADR_2
UPDATE DB CFG FOR SAMPLE USING HADR_REMOTE_INST sample
UPDATE DB CFG FOR SAMPLE USING HADR_SYNCMODE NEARSYNC
UPDATE DB CFG FOR SAMPLE USING HADR_TIMEOUT 300
CONNECT TO SAMPLE
QUIESCE DATABASE IMMEDIATE FORCE CONNECTIONS
UNQUIESCE DATABASE
CONNECT RESET
--
-- Update HADR configuration parameters on standby database - TANK -
SAMPLET (sample) - SAMPLET (SAMPLE)
--
UPDATE DB CFG FOR SAMPLE USING HADR_LOCAL_HOST tank
UPDATE DB CFG FOR SAMPLE USING HADR_LOCAL_SVC DB2_HADR_2
UPDATE DB CFG FOR SAMPLE USING HADR_REMOTE_HOST dozer
UPDATE DB CFG FOR SAMPLE USING HADR_REMOTE_SVC DB2_HADR_1
UPDATE DB CFG FOR SAMPLE USING HADR_REMOTE_INST sample
UPDATE DB CFG FOR SAMPLE USING HADR_SYNCMODE NEARSYNC
UPDATE DB CFG FOR SAMPLE USING HADR_TIMEOUT 300
--
-- Start HADR on standby database - TANK - SAMPLET (sample) - SAMPLET
(SAMPLE)
--
DEACTIVATE DATABASE SAMPLE
START HADR ON DATABASE SAMPLE AS STANDBY
--
-- Start HADR on primary database - DOZER - sample - SAMPLE
--
DEACTIVATE DATABASE SAMPLE
START HADR ON DATABASE SAMPLE AS PRIMARY
;
when run it this order the standby dabases end up in a remote catch-up
pending state as reported by the HADR GUI management tool and the
snapshot command.

HADR Status
Role = Standby
State = Remote catchup pending
Synchronization mode = Nearsync
Connection status = Disconnected, 02-06-2006 11:17:01.627914
Heartbeats missed = 0
Local host = tank
Local service = DB2_HADR_2
Remote host = dozer
Remote service = DB2_HADR_1
Remote instance = sample
timeout(seconds) = 300
Primary log position(file, page, LSN) = S0000000.LOG, 0,
0000000000000000
Standby log position(file, page, LSN) = S0000001.LOG, 0,
0000000001388000
Log gap running average(bytes) = 0

diag results.

2006-06-02-11.07.01.737895-240 E79516G362 LEVEL: Event
PID : 25906 TID : 3086079680 PROC : db2hadrs
(SAMPLE) 0
INSTANCE: sample NODE : 000 DB : SAMPLE
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSetHdrState,
probe:10000
CHANGE : HADR state set to S-RemoteCatchupPending (was S-LocalCatchup)

2006-06-02-11.07.01.636782-240 I79879G398 LEVEL: Warning
PID : 25247 TID : 3086079680 PROC : db2agent
(SAMPLE) 0
INSTANCE: sample NODE : 000 DB : SAMPLE
APPHDL : 0-39 APPID: *LOCAL.sample.060602150702
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEduStartup,
probe:21152
MESSAGE : Info: HADR Startup has completed.

Is there a way to run a command that would test the connection the
standby database is attempting to do, to the primary when starting up
HADR. ??

Jun 2 '06 #12

Mark A

"gumby" <da******@gmail.com> wrote in message
news:11**********************@f6g2000cwb.googlegro ups.com...

These are the commands I have run to try to get HADR going. (Basically
cut paste from what the GUI displays). Do they look okay ? any
suggestions ? I have used your (Mark) suggested ports the others are
the ones that the HADR GUI comes up with.

The ports don't really matter so long as no one else is using them. Each
database must have its own ports (in case you more than one database on that
server using HADR). They should be documented in /etc/services

Here are the scripts that I use (run in order as indicated):

Assumptions:

server01 - primary db server
server02 - standby db server
db2inst1 - instance on primary server
db2inst2 - DB2 instance on standby server (but can be the same as primary)
database - sample
SCRIPT01 - RUN ON PRIMARY SERVER01

# Activate log retain and set log archive path (not necessary if logretain
already enabled some other way)
db2 update db cfg for sample using LOGARCHMETH1 DISK:/db2/archive_logs

#Create offline backup of db to be restored on standby server02 (I am
backing up to a shared mount point)
db2 "BACKUP DATABASE sample TO /db_backup/SAMPLE COMPRESS WITHOUT PROMPTING"

db2 update db cfg for sample using HADR_LOCAL_HOST server01
db2 update db cfg for sample using HADR_REMOTE_HOST server02

db2 update db cfg for sample using HADR_LOCAL_SVC 18819
db2 update db cfg for sample using HADR_REMOTE_SVC 18820
db2 update db cfg for sample using HADR_REMOTE_INST db2inst2
db2 update db cfg for sample using HADR_SYNCMODE nearsync
db2 update db cfg for sample using HADR_TIMEOUT 30
db2 update db cfg for sample using LOGINDEXBUILD ON

#Recommended parms for HADR because logs are sent to standby server
db2 update db cfg for sample using DBHEAP 2048
db2 update db cfg for sample using LOGBUFSZ 256

#This is the host name for automatic client re-route:
db2 update alternate server for database sample using hostname server02 port
50000

SCRIPT02 - RUN ON STANDBY SERVER02

# Restore database on standby server02
db2 RESTORE DATABASE sample FROM /db_backup/SAMPLE TAKEN AT 20060204213007
replace history file

# Activate log retain and set log archive path
db2 update db cfg for sample using LOGARCHMETH1 DISK:/db2/archive_logs

db2 update db cfg for sample using HADR_LOCAL_HOST server02
db2 update db cfg for sample using HADR_REMOTE_HOST server01

db2 update db cfg for sample using HADR_LOCAL_SVC 18820
db2 update db cfg for sample using HADR_REMOTE_SVC 18819
db2 update db cfg for sample using HADR_REMOTE_INST db2inst1
db2 update db cfg for sample using HADR_SYNCMODE nearsync
db2 update db cfg for sample using HADR_TIMEOUT 30
db2 update db cfg for sample using LOGINDEXBUILD ON

#Recommended parms for HADR because logs are sent to standby server
db2 update db cfg for sample using DBHEAP 2048
db2 update db cfg for sample using LOGBUFSZ 256

#This is the host name for automatic client re-route:
db2 update alternate server for database sample using hostname server01 port
50000

db2 start hadr on db sample as standby
SCRIPT03 - RUN ON PRIMARY SERVER01

db2 start hadr on db sample as primary

Jun 2 '06 #13

gumby

Thanks Mark,

used those scripts but still no luck, same error. Confirmed the ports
are not used by aanything else and I have monitored the port on the
primary for activity when the standby should be connecting and
confirmed that the standby is attempting to contact it when the HADR
commands are run.

Checked the diag logs I get the following

2006-06-05-10.40.47.894980+600 I406658G426 LEVEL: Severe
PID : 18468 TID : 3086255808 PROC : db2hadrs
(SAMPLE) 0
INSTANCE: db2inst3 NODE : 000 DB : SAMPLE
FUNCTION: DB2 UDB, High Availability Disaster Recovery,
hdrEduAcceptEvent, probe:20280
MESSAGE : Failed to connect to primary. rc:
DATA #1 : Hexdump, 4 bytes
0xBFE10050 : 1900 0F81 ....

2006-06-05-10.40.47.895157+600 I407085G370 LEVEL: Severe
PID : 18468 TID : 3086255808 PROC : db2hadrs
(SAMPLE) 0
INSTANCE: db2inst3 NODE : 000 DB : SAMPLE
FUNCTION: DB2 UDB, High Availability Disaster Recovery,
hdrEduAcceptEvent, probe:20280
RETCODE : ZRC=0x810F0019=-2129723367=SQLO_CONN_REFUSED "Connection
refused"

Any other ideas ? I think I will attempt a re-install of DB2, fixpacks
and sample instances and try again incase I stuffed up something with
the installation or user setup.

cheers
dub

Jun 5 '06 #14

Mark A

"gumby" <da******@gmail.com> wrote in message
news:11**********************@u72g2000cwu.googlegr oups.com...

Thanks Mark,

used those scripts but still no luck, same error. Confirmed the ports
are not used by aanything else and I have monitored the port on the
primary for activity when the standby should be connecting and
confirmed that the standby is attempting to contact it when the HADR
commands are run.

Checked the diag logs I get the following

2006-06-05-10.40.47.894980+600 I406658G426 LEVEL: Severe
PID : 18468 TID : 3086255808 PROC : db2hadrs
(SAMPLE) 0
INSTANCE: db2inst3 NODE : 000 DB : SAMPLE
FUNCTION: DB2 UDB, High Availability Disaster Recovery,
hdrEduAcceptEvent, probe:20280
MESSAGE : Failed to connect to primary. rc:
DATA #1 : Hexdump, 4 bytes
0xBFE10050 : 1900 0F81 ....

2006-06-05-10.40.47.895157+600 I407085G370 LEVEL: Severe
PID : 18468 TID : 3086255808 PROC : db2hadrs
(SAMPLE) 0
INSTANCE: db2inst3 NODE : 000 DB : SAMPLE
FUNCTION: DB2 UDB, High Availability Disaster Recovery,
hdrEduAcceptEvent, probe:20280
RETCODE : ZRC=0x810F0019=-2129723367=SQLO_CONN_REFUSED "Connection
refused"

Any other ideas ? I think I will attempt a re-install of DB2, fixpacks
and sample instances and try again incase I stuffed up something with
the installation or user setup.

cheers
dub

If reinstall does not work, you should open a PMR with IBM support (assuming
you have a support contract).

Jun 5 '06 #15

Steve Pearson (news only)

Have you tried using fully qualified network naming or IP addresses for
the HADR host configuration parameters? We've seen cases where there
are unexpected problems with name resolution on the primary. For
example, "host" may not work where "host.subnet.domain.com" may work.
I'm not a networking guru and can't explain in detail why this occurs,
but HADR is fairly picky that the host name configured for matches the
host that attempts to connect to the primary as determined via host
name resolution.

Regards,
- Steve P.
--
Steve Pearson, IBM DB2 UDB for LUW Development, IBM Software Group
DB2 "Portland" Team, IBM Beaverton Lab, Beaverton, OR, USA

Jun 5 '06 #16

gumby

Thanks everyone...

got this finally resolved using IP address for host parameters under
HADR (thx Steve and Mark). This was a really frustrating problem, as
the hosts were all well defined in /etc/hosts and resolved fine on the
OS level, anywho all is well now...

For example:

Standby

HADR database role = STANDBY
HADR local host name (HADR_LOCAL_HOST) = 10.18.78.64
HADR local service name (HADR_LOCAL_SVC) = 55002
HADR remote host name (HADR_REMOTE_HOST) = 10.18.78.62
HADR remote service name (HADR_REMOTE_SVC) = 55001
HADR instance name of remote server (HADR_REMOTE_INST) = db2inst1
HADR timeout value (HADR_TIMEOUT) = 120
HADR log write synchronization mode (HADR_SYNCMODE) = NEARSYNC
HADR database role = PRIMARY
HADR local host name (HADR_LOCAL_HOST) = 10.18.78.62
HADR local service name (HADR_LOCAL_SVC) = 55001
HADR remote host name (HADR_REMOTE_HOST) = 10.18.78.64
HADR remote service name (HADR_REMOTE_SVC) = 55002
HADR instance name of remote server (HADR_REMOTE_INST) = db2inst2
HADR timeout value (HADR_TIMEOUT) = 120
HADR log write synchronization mode (HADR_SYNCMODE) = NEARSYNC

thanks
again

dub

Dubravko Akmacic
Automation Engineer
Industrial Markets - Strip & Plate
BlueScope Steel Limited

Jun 6 '06 #17

Unable to start HADR reason code 7

Similar topics