473,770 Members | 2,129 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

High Availability Technologies for DB2

1 New Member
Availability for a brave new world

With the advent of Software as a Service (SaaS) more businesses are relying on the ability to access their business data through web based applications. In addition to the rise of SaaS and Cloud Computing, our businesses are increasingly operating on a global scale. When once you could schedule your maintenance updates for Sunday night, this now affects users across the other side of the globe.

When downtime is unplanned however, these issues multiply ten-fold. These outages are a lot more
visible to users and the public at large with potential ramifications to revenue, brand image and
customer satisfaction.

In this paper we will look at the various solutions to the application availability issue for DB2
databases and how they meet the demands of our ever changing global operations.

Availability solutions for DB2 databases
Lets look first at the newest high availability solution to enter the market – DB2 pureScale

DB2 pureScale is a new optional DB2 feature that allows you to have multiple database servers in a system that all share a common set of disks providing both scalability and availability.

This new technology includes:
• Automatic workload balancing to ensure that no node in the system is over loaded. DB2 will actually route transactions or connections to the least heavily used server. This workload balancing is hidden from the end user and even from applications by having the DB2 client
handle all the workload balancing. The client will actually periodically check the workload levels and re-route transactions to different servers. The workload balancing can occur either at the transaction or connection level. Transaction support was added as many customers and ERP system use connection pooling and without transactional level support workloads may never be moved.
• DB2 pureScale is built on the most reliable UNIX system available – Power Systems. Other platforms will be available in the future DB2 and Power Systems worked very closely on DB2 pureScale to ensure that it is optimized for AIX at all levels, be it memory, networking or storage.
• The technology for globally sharing locks and memory is based on technology from z/OS which has a great track record of being the most reliable and scalable architecture available.
• Tivoli System Automation has been integrated deeply into DB2 pureScale. It is installed and configured as part of the DB2 installation process and DBAs and system administrators never even know its there. The DB2 fixpaks will even include and apply any Tivoli updates so DBAs and system administrators never need to understand another software product.
• The networking infrastructure leverages Infiniband and all additional clustering software is included as pat of DB2 pureScale installation. This technology has allows us to avoid many scaling problems other vendors have run into.
• The core of system is a shared disk architecture.

There are a number of high availability & disaster recovery solutions which have been in the marketplace for some time.

Active-passive clustering is a good general purpose high availability solution within a local environment. It typically provides a warm standby solution – i.e. an outage in the primary server is detected by the backup, which then takes over. The main stumbling block with this method is that it cannot work over a long distance and so is really only suitable for a single location solution.

With an active-passive clustering solution an organisation typically has an active or primary server and a passive or standby server. The TCO of this solution can be relatively high with expensive hardware resources sitting idle on the standby server. In addition to the warm standby server some organisations set up an additional standby within a separate DR site.


A heartbeat between servers detects when the primary server goes down and moves services across to the failover server. There is generally an outage experienced where the primary server has failed, and the standby server detects this change in state.

However, this is a solution used by many organisations across Europe and the US, especially within the banking sector.

Examples of active-passive implementations are the AIX HACMP, and DB2 UDB for Linux Unix and Windows HADR.

HADR or High Availability Disaster Recovery for DB2 from IBM works in a similar way with a primary server and a standby server. The difference here is that the primary server processes transactions and ships logs to the standby server. The standby then stores these and applies the log buffers from the primary. Whilst this results in two copies of the database, this isolates the customer from disk subsystem failures. On failover the standby becomes the new primary. HADR is a good system and one that has been deployed across many customer sites. It does, however, still rely on the Active-Passive database set-up meaning that expensive resources are left idle.

HACMP runs at the operating system level, with a heartbeat signal ensuring that the services are still available. The heartbeat can be implemented over the network, or through a serial connection or even shared disk. If the passive server does not receive regular heartbeats from the active server, it will take over services.

Services are provided to networked requesters over a virtual IP address (VIPA), and it is this which is moved over in the event of take-over processing.

Note that HACMP solutions usually utilise a shared SAN solution, so that the database is as up to date as possible. When the heartbeat is lost, the active server must assume that it has lost connectivity and start closing its services, to ensure that they can be successfully restarted on the passive server.

Similarly, the passive server must wait for a pre-arranged period to ensure that the active server has completed shutdown processing.

The total delay, then, from loss of service on the primary, and restoration of service on the secondary can be several minutes.

Note also that takeover does not occur on the first lost heartbeat, but typically the third. This is to ensure that network or server workloads do not cause “false” takeovers.

HADR is a similar technology to HACMP, but is implemented in the database server, rather than the operating system. The reliance on shared SAN is dropped, with the active database shipping log buffers to the passive copy to apply. These buffers are then applied on the passive copy, ensuring that it is kept nearly in sync with the primary copy.

Note that HADR relies on automation to affect the switch over from the primary to the slave.

Peer to Peer Clustering, or 2-Way Replication allows two or more active database servers to provide read / write access to application data. Data updates are delivered over the replication solution to the other members of the replication cluster in an asynchronous manner – i.e. transaction performance is not impacted, but a finite time exists between the updates appearing on the source and target servers.

As there is no shared locking strategy, the weakness of this solution is that the same data can be updated on two replication cluster members at the same time leading to data collisions. An example of this may be that a room booking system is updated by two people – the CEO and the cleaner. Both book a room for the same time, the cleaner from the London office, and the CEO from the Edinburgh office. The CEOs booking commits on the Edinburgh server and is replicated to the London office as the cleaners booking commits from the London office and is replicated to Edinburgh. Which booking ends up being applied will depend on how conflicts are resolved by the replication tooling. Typically, it is the last update that wins, and whilst this could lead to some red faces in our example, the issues are more marked with, e.g. a financial services system.

To overcome this problem, customers will often logically partition their data, so that updates are applied on a regional basis, removing the risk of a collision. Whilst providing a solution to the immediate problem, management of this solution can be awkward with different business units having different service requirements, and changes in regional responsibilitie s can be difficult to implement.

Examples of replication tools that would support this sort of solution are DPROP and Informatica.

DB2 for z/OS Data Sharing is an all active, shared memory clustering solution based on the zSeries Parallel Sysplex technology. The parallel sysplex coupling facilities are used to cache locking information and buffered data, making these available to all of the members of the cluster.

This is the pinnacle of high availability solutions for DB2, additionally supporting seamless capacity upgrades as well as a 99.999% up time with a mean time to failure of 60 years.

Mainframe technology has been focused for some time now on high availability and zero outage solutions, and the combination of parallel sysplex, DB2 data sharing and DASD mirroring technologies has combined to provide a robust solution platform.

Availability into the future

Looking forward it is certain that our need for availability will only grow. Downtime and outages will become less and less acceptable to users. In this time of mergers and acquisitions, corporations across the world are needing to join up their IT systems and work with users in disparate locations. All of this points to a growing need for availability solutions which can span geographies and keep applications available to users across the globe 24/7.
Jul 13 '10 #1
0 6082

Sign in to post your reply or Sign up for a free account.

Similar topics

0
1852
by: Joan MacEachern | last post by:
Date: Thursday, October 2, 2003 Time: 10-11am PT/ 1-2pm ET Some business applications define mission critical. You just can't afford to have them go down. Ever. Is 'always on' a realistic goal? Mark Bauhaus, VP, Java Web Services, and Rich Sharples, Senior Product Manager for Application Server, explore this possibility as they examine examples of high availability and introduce Sun's approach
5
3157
by: Bruce | last post by:
Hi - we are upgrading our old Oracle7.3.4 environment to Oracle9. Our current HACMP environment consists of two AIX4.3.3 servers (64-bit) configured as primary and secondary. We have the Oracle 7.3.4 products (binaries) and database files (dbf's, control files, redo logs) on shared twin-tailed non-concurrent disk. When the primary node fails the HACMP software fires up the secondary node, and mounts the filesystems, starts the database...
2
3312
by: Andras Kovacs | last post by:
We have a performance problem to replicate our environnement. Our java code is able to insert 100 000 rows in a table within 3 seconds using Batch Statement. For two oracle sites it takes 6 seconds the inserts. When we enable oracle's advanced replication the time falls to 44 seconds for the two replicated sites. So the transactions took 7 times more time. It is normal Oracle states that Adv replication takes around 6 times more resource...
0
1362
by: Teppei Yagihashi | last post by:
Hello all, Does anyone know what sort of HA features will be implemented in MySql ? What I exactly want to know is whether MySql will provide a HA solution similar to Oracle Real Application Clusters (RAC). RAC allows multiple instances to manipulate a single database on shared disks. If one instance fails, the others still can continue to provide the same service. (This is quite different from MySql Replication which maintains...
0
2307
by: Stylus Studio | last post by:
DataDirect XQuery(TM) is the First Embeddable Component for XQuery That is Modeled after the XQuery API for Java(TM) (XQJ) BEDFORD, Mass.--Sept. 20, 2005--DataDirect Technologies (http://www.datadirect.com), the software industry leader in standards-based components for connecting applications to data and an operating unit of Progress Software Corporation, today announced the release of DataDirect XQuery(TM), the first embeddable...
1
1830
by: hikums | last post by:
Can anyone post the procedure to split mirror steps for Veritas or Legato or from any vendor? ,so that I can better it. I understand the suspend i/o and initialization(db2inidb) , but need to understand this more. Thanks!!!!
9
2368
by: Lyle Fairfield | last post by:
It's confusing. Many people here and elsewhere make many different predictions: There's an introduction mentioning some aspects of this at http://msdn.microsoft.com/data/mdac/techinfo/default.aspx? pull=/library/en-us/dnmdac/html/data_mdacroadmap.asp revised Sep 2005 (upper case conversions are mine)
0
1291
by: fblake | last post by:
Boulogne, France (17th February) Lassalle Technologies, a leading supplier of powerful flowcharting/diagramming components, today released a much-anticipated upgrade to its highly acclaimed product AddFlow for .NET. AddFlow for .NET is a .NET Windows Forms Custom control that is useful each time you need to display and use relationships between objects in your application. The new V2.0, created with Visual Studio 2005, introduces new...
0
6563
by: Vinod Sadanandan | last post by:
IMPACT OF NOLOGGING OPERATIONS IN HIGH AVAILABILITY ENVIRONMENTS Logged operation has to generate redo for every change data or undo block, nologging operations indicate that the database operation is not logged in the online redo log file. Even though a small invalidation redo record is still written to the online redo log file, nologging operations skip the redo generation of the corresponding inserted data. nologging can...
0
1818
by: announcements | last post by:
Liquid Technologies Announces Availability of Liquid XML 2008 (v6.1) New features include: - New Code Generation for C# and VB .Net (2005 and 2008) utilizing Attributed classes with Generic Collections for a massive reduction in the number of generated classes. - New C# and VB .Net Runtimes for .Net 2.0 and .Net 3.5.
0
9618
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10260
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10101
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9906
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
6712
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5354
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5482
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4007
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2850
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.