High Availability Technologies for DB2

Availability for a brave new world

With the advent of Software as a Service (SaaS) more businesses are relying on the ability to access their business data through web based applications. In addition to the rise of SaaS and Cloud Computing, our businesses are increasingly operating on a global scale. When once you could schedule your maintenance updates for Sunday night, this now affects users across the other side of the globe.

When downtime is unplanned however, these issues multiply ten-fold. These outages are a lot more
visible to users and the public at large with potential ramifications to revenue, brand image and
customer satisfaction.

In this paper we will look at the various solutions to the application availability issue for DB2
databases and how they meet the demands of our ever changing global operations.

Availability solutions for DB2 databases
Lets look first at the newest high availability solution to enter the market – DB2 pureScale

DB2 pureScale is a new optional DB2 feature that allows you to have multiple database servers in a system that all share a common set of disks providing both scalability and availability.

This new technology includes:
• Automatic workload balancing to ensure that no node in the system is over loaded. DB2 will actually route transactions or connections to the least heavily used server. This workload balancing is hidden from the end user and even from applications by having the DB2 client
handle all the workload balancing. The client will actually periodically check the workload levels and re-route transactions to different servers. The workload balancing can occur either at the transaction or connection level. Transaction support was added as many customers and ERP system use connection pooling and without transactional level support workloads may never be moved.
• DB2 pureScale is built on the most reliable UNIX system available – Power Systems. Other platforms will be available in the future DB2 and Power Systems worked very closely on DB2 pureScale to ensure that it is optimized for AIX at all levels, be it memory, networking or storage.
• The technology for globally sharing locks and memory is based on technology from z/OS which has a great track record of being the most reliable and scalable architecture available.
• Tivoli System Automation has been integrated deeply into DB2 pureScale. It is installed and configured as part of the DB2 installation process and DBAs and system administrators never even know its there. The DB2 fixpaks will even include and apply any Tivoli updates so DBAs and system administrators never need to understand another software product.
• The networking infrastructure leverages Infiniband and all additional clustering software is included as pat of DB2 pureScale installation. This technology has allows us to avoid many scaling problems other vendors have run into.
• The core of system is a shared disk architecture.

There are a number of high availability & disaster recovery solutions which have been in the marketplace for some time.

Active-passive clustering is a good general purpose high availability solution within a local environment. It typically provides a warm standby solution – i.e. an outage in the primary server is detected by the backup, which then takes over. The main stumbling block with this method is that it cannot work over a long distance and so is really only suitable for a single location solution.

With an active-passive clustering solution an organisation typically has an active or primary server and a passive or standby server. The TCO of this solution can be relatively high with expensive hardware resources sitting idle on the standby server. In addition to the warm standby server some organisations set up an additional standby within a separate DR site.

A heartbeat between servers detects when the primary server goes down and moves services across to the failover server. There is generally an outage experienced where the primary server has failed, and the standby server detects this change in state.

However, this is a solution used by many organisations across Europe and the US, especially within the banking sector.

Examples of active-passive implementations are the AIX HACMP, and DB2 UDB for Linux Unix and Windows HADR.

HADR or High Availability Disaster Recovery for DB2 from IBM works in a similar way with a primary server and a standby server. The difference here is that the primary server processes transactions and ships logs to the standby server. The standby then stores these and applies the log buffers from the primary. Whilst this results in two copies of the database, this isolates the customer from disk subsystem failures. On failover the standby becomes the new primary. HADR is a good system and one that has been deployed across many customer sites. It does, however, still rely on the Active-Passive database set-up meaning that expensive resources are left idle.

HACMP runs at the operating system level, with a heartbeat signal ensuring that the services are still available. The heartbeat can be implemented over the network, or through a serial connection or even shared disk. If the passive server does not receive regular heartbeats from the active server, it will take over services.

Services are provided to networked requesters over a virtual IP address (VIPA), and it is this which is moved over in the event of take-over processing.

Note that HACMP solutions usually utilise a shared SAN solution, so that the database is as up to date as possible. When the heartbeat is lost, the active server must assume that it has lost connectivity and start closing its services, to ensure that they can be successfully restarted on the passive server.

Similarly, the passive server must wait for a pre-arranged period to ensure that the active server has completed shutdown processing.

The total delay, then, from loss of service on the primary, and restoration of service on the secondary can be several minutes.

Note also that takeover does not occur on the first lost heartbeat, but typically the third. This is to ensure that network or server workloads do not cause “false” takeovers.

HADR is a similar technology to HACMP, but is implemented in the database server, rather than the operating system. The reliance on shared SAN is dropped, with the active database shipping log buffers to the passive copy to apply. These buffers are then applied on the passive copy, ensuring that it is kept nearly in sync with the primary copy.

Note that HADR relies on automation to affect the switch over from the primary to the slave.

Peer to Peer Clustering, or 2-Way Replication allows two or more active database servers to provide read / write access to application data. Data updates are delivered over the replication solution to the other members of the replication cluster in an asynchronous manner – i.e. transaction performance is not impacted, but a finite time exists between the updates appearing on the source and target servers.

As there is no shared locking strategy, the weakness of this solution is that the same data can be updated on two replication cluster members at the same time leading to data collisions. An example of this may be that a room booking system is updated by two people – the CEO and the cleaner. Both book a room for the same time, the cleaner from the London office, and the CEO from the Edinburgh office. The CEOs booking commits on the Edinburgh server and is replicated to the London office as the cleaners booking commits from the London office and is replicated to Edinburgh. Which booking ends up being applied will depend on how conflicts are resolved by the replication tooling. Typically, it is the last update that wins, and whilst this could lead to some red faces in our example, the issues are more marked with, e.g. a financial services system.

To overcome this problem, customers will often logically partition their data, so that updates are applied on a regional basis, removing the risk of a collision. Whilst providing a solution to the immediate problem, management of this solution can be awkward with different business units having different service requirements, and changes in regional responsibilities can be difficult to implement.

Examples of replication tools that would support this sort of solution are DPROP and Informatica.

DB2 for z/OS Data Sharing is an all active, shared memory clustering solution based on the zSeries Parallel Sysplex technology. The parallel sysplex coupling facilities are used to cache locking information and buffered data, making these available to all of the members of the cluster.

This is the pinnacle of high availability solutions for DB2, additionally supporting seamless capacity upgrades as well as a 99.999% up time with a mean time to failure of 60 years.

Mainframe technology has been focused for some time now on high availability and zero outage solutions, and the combination of parallel sysplex, DB2 data sharing and DASD mirroring technologies has combined to provide a robust solution platform.

Availability into the future

Looking forward it is certain that our need for availability will only grow. Downtime and outages will become less and less acceptable to users. In this time of mergers and acquisitions, corporations across the world are needing to join up their IT systems and work with users in disparate locations. All of this points to a growing need for availability solutions which can span geographies and keep applications available to users across the globe 24/7.

Jul 13 '10 #1

Subscribe Post Reply

6041

by: Joan MacEachern | last post by:

Date: Thursday, October 2, 2003 Time: 10-11am PT/ 1-2pm ET Some business applications define mission critical. You just can't afford to have them go down. Ever. Is 'always on' a realistic...

Java

Help!! Any non-RAC high-availability experiences with Oracle9 ??

by: Bruce | last post by:

Hi - we are upgrading our old Oracle7.3.4 environment to Oracle9. Our current HACMP environment consists of two AIX4.3.3 servers (64-bit) configured as primary and secondary. We have the Oracle...

Oracle Database

High performance replication ?

by: Andras Kovacs | last post by:

We have a performance problem to replicate our environnement. Our java code is able to insert 100 000 rows in a table within 3 seconds using Batch Statement. For two oracle sites it takes 6...

Oracle Database

High Availability in Mysql

by: Teppei Yagihashi | last post by:

Hello all, Does anyone know what sort of HA features will be implemented in MySql ? What I exactly want to know is whether MySql will provide a HA solution similar to Oracle Real Application...

MySQL Database

DataDirect Technologies Releases DataDirect XQuery to Simplify XML and Relational Data Integration

by: Stylus Studio | last post by:

DataDirect XQuery(TM) is the First Embeddable Component for XQuery That is Modeled after the XQuery API for Java(TM) (XQJ) BEDFORD, Mass.--Sept. 20, 2005--DataDirect Technologies...

Microsoft SQL Server

split mirror : high availability

by: hikums | last post by:

Can anyone post the procedure to split mirror steps for Veritas or Legato or from any vendor? ,so that I can better it. I understand the suspend i/o and initialization(db2inidb) , but need to...

DB2 Database

The future of various DB connection technologies and related technoloies

by: Lyle Fairfield | last post by:

It's confusing. Many people here and elsewhere make many different predictions: There's an introduction mentioning some aspects of this at...

Microsoft Access / VBA

Lassalle Technologies release Addflow for .NET V2.0

by: fblake | last post by:

Boulogne, France (17th February) Lassalle Technologies, a leading supplier of powerful flowcharting/diagramming components, today released a much-anticipated upgrade to its highly acclaimed product...

.NET Framework

Impact Of Nologging Operations In High Availability Environments

by: Vinod Sadanandan | last post by:

IMPACT OF NOLOGGING OPERATIONS IN HIGH AVAILABILITY ENVIRONMENTS Logged operation has to generate redo for every change data or undo block, nologging operations indicate that the...

Oracle Database

Liquid Technologies Announces Availability of Liquid XML 2008 (v6.1)

by: announcements | last post by:

Liquid Technologies Announces Availability of Liquid XML 2008 (v6.1) New features include: - New Code Generation for C# and VB .Net (2005 and 2008) utilizing Attributed classes with Generic...

.NET Framework

Cloud Servers without Credit Card and Email Registration: A Simpler Way to Get on the Cloud

by: CloudSolutions | last post by:

Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...

General

Wordpress or something else?

by: Faith0G | last post by:

I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

Content Management Systems

Access Europe: Command bars, the Access Shortcut Tool and a simple Audit Log - Wed 3 April

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

High Availability Technologies for DB2

Similar topics