HP Serviceguard Toolkits for Database Replication Solutions User Guide HP Part Number: 5900-2151 Published: March 2012 Edition: 4
© Copyright 2012 Hewlett-Packard Development Company, L.P. Confidential computer software. Valid license from HP required for possession, use or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial license. The information contained herein is subject to change without notice.
Contents 1 Introduction...............................................................................................5 2 Serviceguard toolkit for Oracle Data Guard...................................................6 Overview................................................................................................................................6 Advantages.........................................................................................................................6 Dependencies............
HP authorized resellers............................................................................................................54 Documentation feedback.........................................................................................................54 New and changed information in this edition.............................................................................54 Related information..........................................................................................................
1 Introduction The HP Serviceguard Toolkits for Database Replication Solutions User Guide includes the HP Serviceguard toolkit for Oracle Data Guard (ODG toolkit) and the HP Serviceguard toolkit for DB2 High Availability Disaster Recovery (DB2 HADR toolkit). NOTE: The product name used for the depot is “Serviceguard Disaster Recovery Toolkits for Databases”. However, the product name was changed to “HP Serviceguard Toolkits for Database Replication Solutions” to match the functionality delivered.
2 Serviceguard toolkit for Oracle Data Guard Overview The HP Serviceguard toolkit for Oracle Data Guard (ODG) toolkit facilitates easy integration of Oracle Data Guard (ODG) in an HP Serviceguard cluster for improved high availability and disaster recovery for an Oracle database. This toolkit contain scripts that manage the ODG primary and standby database instances.
Dependencies The ODG toolkit requires the ECMT Oracle toolkit to provide high availability to a single-instance ODG. Similarly, in an RAC environment, the ODG toolkit requires the SGeRAC toolkit to provide high availability to ODG RAC database instances. NOTE: For information about supportability and compatibility with various versions of Serviceguard, Toolkits and HP-UX, see the HP Serviceguard Toolkit Compatibility Matrix available at http://www.hp.com/go/hpux-serviceguard-docs.
The ODG toolkit depends on the ECMT Oracle toolkit. Hence, customers who do not already have the ECMT product must purchase it along with the ODG toolkit. The Oracle database is started using the ECMT Oracle toolkit. The Data Guard processes are then started using the ODG toolkit, after which, the application is monitored. If either the Oracle database or any of the Data Guard processes fail, the package fails over because the Oracle database and the Data Guard are integrated in a single package.
NOTE: The package parameter, START_MODE, must be set to mountwhen an ECMT Oracle toolkit is used in combination with an ODG toolkit. For an Active Data Guard, the standby database is started up to the [open] state. Set the ACTIVE_STANDBY parameter to [yes], if you have purchased the optional license to enable Active Standby functionality in the Oracle Data Guard Enterprise Edition. Active Data Guard is supported in Oracle database version 11gR1 or later.
Figure 3 Data Guard replication between RAC primary package and single-instance stand-alone standby database. Figure 3 (page 10), shows a Data Guard configuration where the primary database is Oracle RAC and the standby is a single-instance database instance. The RAC primary is configured on nodes 1 and 2 of the SG cluster 1. NOTE: There can be more than two nodes in a cluster. In the above mentioned example, we have taken two nodes, for better understanding.
Figure 4 (page 10), shows a Data Guard configuration where the primary database is configured as an RAC and the standby database is a single-instance database. Both primary and standby databases are configured in separate Serviceguard clusters for high availability. The RAC primary is combined with the ODG toolkit and the SGeRAC toolkit in a single package. It is configured on Node 1 and Node 2 in the SG Cluster 1.
The standby database is created in the recovery cluster and is also placed on a shared disk. A recovery group is created in the Continentalclusters environment with the following three packages: 1. 2. 3. Primary package: This package is created on the primary cluster, using the ODG toolkit. It brings up the Oracle database on the primary cluster as a primary database and starts monitoring the primary database processes.
fails, the Serviceguard configured on the primary cluster fails over the database to another node within the primary cluster, thus providing high availability to the primary database. Similarly, if the standby database fails, then the Serviceguard configured on the recovery cluster allows the database to failover to another node within the recovery cluster. When the primary cluster is down, the administrator must run the cmrecovercl command on the recovery cluster to bring up the recovery package.
start_script_timeout parameter in the package configuration should be appropriately specified. Restoring the cluster in a Continentalclusters to its original state is a manual process. The following steps must be performed to restore the clusters to their original state: 1. 2. 3. 4. Halt the recovery package — halts the recovery package Resync primary database from the standby database.
Figure 8 Single-instance Data Guard configuration in Metrocluster with standby database residing outside Metrocluster NOTE: This configuration is supported both in single-instance and RAC environments. For better understanding, the packages in the Figure 8 (page 15) are shown for a single-instance database. In Figure 8 (page 15), Metrocluster is configured with two data centers; Data Center 1 as the primary site, and Data Center 2 as the recovery site.
to the second site. If the standby database is configured on the second site of the Metrocluster it becomes redundant configuration. If there is a disaster and Data Center 1 is down, the primary database instance is failed over to Data Center 2 and continues to function as a primary database. There will be no problems in starting the primary database at Data Center 2 using the replicated data from the shared disk. This is because Metrocluster has achieved data replication from Data Center 1 to 2.
Three data center configuration Figure 10 Single-instance Data Guard setup in a Continentalclusters environment where the primary cluster is configured as a Metrocluster Figure 10 (page 17), describes a Continentalclusters setup with two clusters spread over three different sites. The primary cluster is configured as a Metrocluster spread over two different sites that are geographically dispersed within the confines of a metropolitan area.
configurations are not supported within a Continentalclusters setup. However, the ODG toolkit in Continentalclusters does not restrict you from configuring standby databases that are placed outside the Continentalclusters environment. NOTE: This configuration is supported both in single-instance and RAC environments. For better understanding, the packages in the Figure 10 (page 17) are shown for a single-instance database.
ODG toolkit configurations has the following benefits: • High availability for primary/secondary databases • Automation of start/stop of databases Oracle Data Guard toolkit configuration in Benefits Extended Distance Cluster environment It uses only one Serviceguard cluster. It stretches a Serviceguard cluster across data centers up to 100 km apart and provides protection against site outages. Continentalclusters environment It provides push button automated role takeover of the ODG database.
Installing and uninstalling Oracle Data Guard toolkit The ODG toolkit is part of the HP Serviceguard Toolkits for Database Replication Solutions and is available on installing HP Serviceguard Toolkits for Database Replication Solutions. NOTE: The product name used for the depot is “Serviceguard Disaster Recovery Toolkits for Databases”. However, the product name was changed to “HP Serviceguard Toolkits for Database Replication Solutions” to match the functionality delivered.
Table 2 Files created on installation of the HP Serviceguard toolkit for Oracle Data Guard File Name Description Available in Directory SGAlert.sh Alert Mail generation script Main Script in Single Instance Environment (hadg.sh) This script contains a list of internally used variables and functions that support the starting, stopping, and monitoring of an ODG instance. This script is called by tkit_module.
Table 4 Module scripts of the HP Serviceguard toolkit for Oracle Data Guard File Name Description Available in Directory Toolkit Module Script (tkit_module.sh) This script is called by the Master /etc/cmcluster/scripts/tkit/ Control Script and acts as an interface dataguard between the Master Control Script and the toolkit interface script (hadg.sh/hadg_rac.sh). It is also responsible for calling the toolkit Configuration File Generator Script (described below).
Table 5 Package attributes (continued) Variable Name Description START_STANDBY_AS_PRIMARY This parameter specifies whether the standby database must be started as the primary database or not. It has to be set to [yes] in the recovery package of the Continentalclusters' recovery group. When primary package goes down, the user must run the command cmrecovercl to bring up the recovery package on the recovery cluster.
Single-instance environment The sample configuration mentioned below uses the installation directory mode operation. This example on ODG package setup and configuration is for an ODG configuration using LVM. It illustrates the creation of a package for ODG in a single-instance environment. 1. Creating a package configuration • Create two packages: one for the primary database on the primary cluster and the other for the standby database on the standby cluster.
# Define the instance type # ecmt/oracle/oracle/INSTANCE_TYPE database -----------------------------------------------------------------------# # Define Oracle home # ecmt/oracle/oracle/ORACLE_HOME /var/orahome -----------------------------------------------------------------------# # Define user name of Oracle database administrator # ecmt/oracle/oracle/ORACLE_ADMIN oracle -----------------------------------------------------------------------# # Define oracle session name # ecmt/oracle/oracle/SID_NAME ORC
# #ecmt/oracle/oracle/LISTENER_RESTART ------------------------------------------------------------------------ 26 Serviceguard toolkit for Oracle Data Guard
NOTE: The following are the service commands for the package: service_name oracle_service_test service_cmd “$SGCONF/scripts/ecmt/oracle/tkit_module.sh oracle_monitor” service_restart none service_fail_fast_enabled no service_halt_timeout 300 service_name oracle_listener_service_test service_cmd “$SGCONF/scripts/ecmt/oracle/tkit_module.
# # "vg" is used to specify which volume groups are used by this package. # vg vgora -----------------------------------------------------------------------# # "fs_name", "fs_directory", "fs_mount_opt", "fs_umount_opt", "fs_fsck_opt", # and "fs_type" specify the file systems which are used by this package.
-----------------------------------------------------------------------# # "run_script_timeout" is the number of Seconds allowed for package to start. # "halt_script_timeout" is the number of Seconds allowed for package to halt. # run_script_timeout 600 halt_script_timeout 700 Note:"halt_script_timeout" has to be more than the sum of all the individual "service_halt_timeout"s of the "service_cmds". In SGeRAC toolkit this value is 600, by default.
tkit/dataguard/dataguard/START_STANDBY_AS_PRIMARY no -----------------------------------------------------------------------# # Define e-mail address for sending alerts # #tkit/dataguard/dataguard/ALERT_MAIL_ID ------------------------------------------------------------------------ Adding the package to the Serviceguard cluster After the setup is complete, add the package to the Serviceguard cluster, and then start the cluster. $ cmapplyconf -P dgpkg.
Single-instance environment NOTE: is In the example the package name is considered to be dgpkg, and the package directory /etc/cmcluster/pkg/dgpkg, and the ORACLE_HOME is configured as /orahome. 1. To disable the failover of the package, enter following command at the prompt: $ cmmodpkg -d dgpkg 2. To pause the monitor script, create an empty file /etc/cmcluster/pkg/dgpkg/ dataguard.debugby entering the command: $ touch /etc/cmcluster/pkg/dgpkg/dataguard.
database versions on all package nodes of the cluster. If there are any inconsistencies, cluster verification logs appropriate warning messages. It does not lead to a package validation failure during package apply or package check. In a single-instance environment, consider a two-node cluster, where both nodes have Serviceguard A.11.20, ECMT B.06.00 and same Oracle database versions but different ODG toolkit versions. Use cmcheckconf to check package configuration using the node1# cmcheckconf -P pkg.
• When using the ODG Broker toolkit in Continentalclusters environment, the “Fast Start Failover” feature of ODG Broker must be disabled. In case of a disaster at the primary site, the “Fast start failover” feature of ODG Broker enables automatic failover of the primary database to an available standby database. This may lead to Data Integrity issues when the toolkit attempts to failover the primary to a different standby.
3 Serviceguard toolkit for DB2 High Availability Disaster Recovery Overview The HP Serviceguard toolkit for DB2 High Availability Disaster Recovery (DB2 HADR toolkit) enables you to configure the DB2 primary and standby database as two Serviceguard packages. It provides high availability for DB2 database and role management assistance, such as role takeover and role switch for DB2 HADR. DB2 HADR toolkit handles role takeover automatically.
• To avoid data loss, during DB2 HADR configuration, the peer window value must be appropriately set. This value must be four times the multiple of the value of monitor interval. This is to ensure that the role takeover is initiated within peer window time. Use Network Time Protocol to ensure time synchronization between the nodes where the primary and standby database resides.
Using the DB2 HADR toolkit Packaging methods DB2 HADR toolkit packages can be configured in combined or separate way. • Combined Packaging: This method of packaging is recommended if an instance comprises only one database. In this method of packaging, a package consists of ECMT DB2 and DB2 HADR modules and manages (starts, stops and monitors) both the instance and the HADR. • Separate Packaging: This method of packaging is recommended if an instance comprises multiple databases.
configured to run either on Node 3 or Node 4 and is currently running on Node 3. If the standby package fails on Node 3, it fails over to Node 4. The primary database package manages the primary database and the primary HADR. Similarly, the standby database package manages the standby database and the standby HADR. Also, the standby package automatically takes over the role of the primary database, if the primary database fails or the node on which the primary database is currently running crashes.
database clients reconnect to this database using the Automatic Client Reroute feature of DB2 HADR. If the ROLE_MANAGEMENT attribute is set to [no], behavior is the same as in Event 2. Event 4: Primary package is manually halted If the primary package is halted using the cmhaltpkg command, the standby package does not perform a role takeover. The standby package continues to run and logs a message in the package log stating that the primary package is manually halted.
Figure 14 HA to Primary Database This configuration provides High Availability (HA) to primary databases. In Figure 14 (page 39), DB2 primary database is configured in a volume group shared between Node1 and Node2 in a Serviceguard cluster. The standby database is running on Node3 placed outside the cluster. The primary package is configured to run either on Node1 or Node2 and running on Node1. The standby database is not packaged using DB2 HADR toolkit.
Figure 15 Primary and standby packages in the same cluster to perform only role management In this configuration, primary and standby packages are configured in the same cluster to provide automatic role management at the database level. Here, if any one of the primary databases fail, corresponding standby database package performs role takeover without impacting other databases. In Figure 15, HADR is configured between primary and standby databases.
On Node2, if the HADR configured for standby database, Sample1, is down, the Standby Package1 fails. Similarly, if the HADR configured for standby database, Sample2, is down, the Standby Package2 fails. In either of these cases, failure of one package does not impact other packages configured for the same instance. In this case, the standby package logs a failure message in the package log and sends an email if the ALERT_MAIL_ID package attribute is set. The primary HADR package continues to run.
This scenario is a continuation of the events 2, 3, and 4. After the initial role takeover, Primary Package1, which was originally configured as the primary package, attempts to start as the primary package. It fails to come up as the primary package because a role takeover was performed when the Primary Package1 went down, and the respective Standby Package1 is now running as primary HADR. In this case, the package attempts to come up as standby.
running on Node2. Similarly, ECMT DB2 Package2 is configured to run either on Node3 or on Node4, and is currently running on Node3. For primary and standby packages, the package dependency and package priority must be configured in such a way that if any of the packages fail, all the packages must failover to the alternate node. While you set up the package for failover, you must set the dependency and priority attributes for the package.
Consider that ROLE_MANAGEMENT attribute is set to yes. If Node2 crashes when it is in the online state, all the packages configured on Node2 fails over to the alternate node, Node1. The standby HADR package uses the by force option to perform a role takeover, and thus becomes the new primary database. All database clients reconnect to this database using the Automatic Client Reroute feature of DB2 HADR.
4. ◦ tkit/db2hadr/db2hadr and tkit/db2hadr/hadr are the names of the DB2 HADR toolkit modules. ◦ pkg.conf is the name of the package configuration file. Edit the package configuration file, and apply it using the cmapplyconf command. Edit the following attributes manually in this file before creating the package: Attributes Description package_name The package name must be unique in the cluster. package_type Package must be a failover package.
Attributes HADR_IP Description Set the IP address used for performing the role switch. Provide IP address from the subnet that is monitored by Serviceguard NOTE: This IP address should not be used to configure HADR and the HADR must not use this IP for any purpose. The format of this value should be as follows: : For example: tkit/db2hadr/db2hadr/HADR_IP 10.76.1.200:10.76.1.0, where: • 10.76.1.200 is the IP address • 10.76.1.0 is the network subnet.
Attributes Description service_name Name of the service that Serviceguard monitors while the package is up. This name must be unique for both primary and standby packages in a Serviceguard cluster. The default value for DB2 and DB2 HADR service is db2_service and db2hadr_service, respectively. service_cmd It is the command line to start the service. For DB2 service:$SGCONF/scripts/ecmt/db2/ tkit_module.sh db2_monitor. For DB2 HADR service:$SGCONF/scripts/tkit/ db2hadr/tkit_module.sh db2hadr_monitor.
5. 6. After the package configuration file is edited, save the file. To validate the package configuration file, run the command: cmcheckconf –P pkg.conf 7. To apply the command if the cmcheckconf command succeeds, run the command: cmapplyconf -P pkg.conf 8. 9. Confirm if you want to modify the package configuration. The default value is [yes] To enable auto run and package failover, run the following commands, $ cmmodpkg -e -n -n hadrpkg and, $ cmmodpkg -e hadrpkg 10.
1. 2. To enable the maintenance mode, in the Package Configuration file, set the MAINTENANCE_FLAG attribute to [Yes ] before applying the cmapplyconf command. To start the maintenance mode for the HADR package in the TKIT_DIR of HADR package,, create the hadr.debug file. NOTE: • If you set the dependent ECMT DB2 package in the maintenance mode (create db2.
Check syslog for more details. cmcheckconf: Verification completed. No errors found. Use the cmapplyconf command to apply the configuration.
Troubleshooting This section explains some of the problem scenarios that you might encounter while working with the DB2 HADR toolkit in an HP Serviceguard Cluster. Problem Scenario Possible Cause Recommended Action If the package log contains an error message: The SSH connection without password Configure SSH connection without is not configured properly. password properly. Host key verification failed. Lost connection. To verify the possible cause: 1.
Problem Scenario Possible Cause Recommended Action 1. Run db2 takeover hadr on db sample by force command on machine where the state of the DB2 HADR is one of the following states: remote catchup pending, peer or disconnected peer. 2. After the state changes to “peer”, run the db2 takeover hadr on db sample command on standby. Limitations This section lists the limitations of DB2 HADR toolkit in an HP Serviceguard Cluster: 52 • Start the standby package before you start the primary package.
4 Support and other resources Information to collect before contacting HP Be sure to have the following information available before you contact HP: • Software product name • Hardware product model number • Operating system type and version • Applicable error message • Third-party hardware or software • Technical support registration number (if applicable) How to contact HP Use the following methods to contact HP technical support: Use the following methods to contact HP technical support: • Se
Warranty information HP will replace defective delivery media for a period of 90 days from the date of purchase. This warranty applies to all Insight Management products. HP authorized resellers For the name of the nearest HP authorized reseller, see the following sources: • In the United States, see the HP U.S. service locator website: http://www.hp.com/service_locator • In other locations, see the Contact HP worldwide website: http://www.hp.
{} In command syntax statements, these characters enclose required content. | The character that separates items in a linear list of choices. ... Indicates that the preceding element can be repeated one or more times. WARNING An alert that calls attention to important information that, if not understood or followed, results in personal injury.
A To configure SSH connection without password for root user between two nodes This section describes how to configure SSH connection without password for root user between two nodes. In this example, it is considered that DB2 HADR is configured using the host names of the two nodes (Node2 and Node3) as shown in the following db2 command result: db2 get db cfg for | grep -i hadr In the following output, Node2 and Node3 are the host names of the nodes that are used to configure DB2 HADR.
Node2# ssh Node3 cat /.ssh/id_rsa.pub >> /.ssh/authorized_keys Node2# ssh Node3 cat /.ssh/id_dsa.pub >> /.ssh/authorized_keys Node2# scp /.ssh/authorized_keys Node3:.ssh/authorized_keys NOTE: Provide root user’s password when asked. Node2# exec /usr/bin/ssh-agent $SHELL Node2# /usr/bin/ssh-add Identity added: /.ssh/id_rsa (/.ssh/id_rsa) Identity added: /.ssh/id_dsa (/.ssh/id_dsa) Node2# ssh Node2 ls /.
Offending key for IP in /home/user/.ssh/known_hosts:6 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@IIT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY! 3. 4. 5. 6. 7. 58 A warning message Offending key for IP in /home/user/.ssh/known_hosts is displayed. Remove the [key number 6] from /home/user/.ssh/known_hosts, and then copy it in a temporary file. To add a new key to/home/user/.ssh/known_hosts.
B Sample package configuration file for the DB2 HADR standby package created using the combined method of packaging This section provides with a sample package configuration file for the DB2 HADR standby package created using the combined method of packaging: # ********************************************************************** # ****** HIGH AVAILABILITY PACKAGE CONFIGURATION FILE (template) ******* # ********************************************************************** # ******* Note: This file MUST be
C Sample package configuration file for the DB2 HADR standby package created using the separate method of packaging This section provides with a sample package configuration file for the DB2 HADR standby package created using the separate method of packaging: # ********************************************************************** # ****** HIGH AVAILABILITY PACKAGE CONFIGURATION FILE (template) ******* # ********************************************************************** # ******* Note: This file MUST be
Glossary ECMT Enterprise Cluster Master Toolkit EDC Extended Distance Cluster HA High Availability HADR High Availability Disaster Recovery MAA Maximum Availability Architecture MNP Multi Node Package ODG Oracle Data Guard RAC Oracle Real Application Clusters vg Volume Group 61
Index A O adding package to SG cluster, 30 ODG maintenance RAC environment, 31 single-instance environment, 31 ODG toolkit advantages, 6 configuration benefits, 18 installation, 20 limitations, 32 maintenance, 30 setting up Oracle Data Guard toolkit, 19 troubleshooting, 32 uninstalltion, 20 Oracle Data Guard configuring multiple instances, 18 overview, 6 C cluster verification, 31 configuring ODG toolkit RAC environment, 28 single-instance environment, 24 Continentalclusters environment configuration, 1