Veritas Storage Foundation 5.1 SP1 for Oracle RAC Administrator"s Guide (5900-1512, April 2011)

ManualsBrandsHP ManualsSoftwareHP-UX 11i Volume Management (LVM/VxVM) Software

Veritas Storage Foundation 5.1 SP1 for

Oracle RAC Administrator's Guide

HP-UX 11i v3

HP Part Number: 5900-1512

Published: April 2011

Edition: 1.0

Summary of content (287 pages)

PAGE 1
Veritas Storage Foundation 5.1 SP1 for Oracle RAC Administrator's Guide HP-UX 11i v3 HP Part Number: 5900-1512 Published: April 2011 Edition: 1.
PAGE 2
© Copyright 2011 Hewlett-Packard Development Company, L.P. Legal Notices Confidential computer software. Valid license from HP required for possession, use or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation and Technical Data for Commercial Items are licensed to the U.S. Government under vendor’s standard commercial license. The information contained herein is subject to change without notice.
PAGE 3
Contents Technical Support ............................................................................................... 4 Section 1 SF Oracle RAC concepts and administration ........................................................ 15 Chapter 1 Overview of Veritas Storage Foundation for Oracle RAC ................................................................................... 17 About Veritas Storage Foundation for Oracle RAC .............................. Benefits of SF Oracle RAC ..............
PAGE 4
8 Contents Chapter 2 Administering SF Oracle RAC and its components .................................................................... 73 Administering SF Oracle RAC ......................................................... 73 Setting the environment variables ............................................. 74 Starting or stopping SF Oracle RAC on each node ......................... 75 Applying Oracle patches ..........................................................
PAGE 5
Contents cpsadm command ................................................................. About administering the coordination point server ..................... Refreshing registration keys on the coordination points for server-based fencing ....................................................... Replacing coordination points for server-based fencing in an online cluster ................................................................
PAGE 6
10 Contents Installer cannot create UUID for the cluster .................................... Troubleshooting installation and configuration check failures ............ Time synchronization checks failed with errors ......................... System architecture checks failed with errors ............................ Operating system and patch level synchronization checks failed with errors .................................................................... CPU frequency checks failed with errors .........
PAGE 7
Contents Node is unable to join cluster while another node is being ejected .......................................................................... System panics to prevent potential data corruption .................... How vxfen driver checks for preexisting split-brain condition ...................................................................... Cluster ID on the I/O fencing key of coordinator disk does not match the local cluster’s ID ..............................................
PAGE 8
12 Contents Removing Oracle Clusterware if installation fails ....................... Troubleshooting the Virtual IP (VIP) Configuration .................... Loss of connectivity to OCR and voting disk causes the cluster to panic ........................................................................ Troubleshooting Oracle Clusterware health check warning messages ...................................................................... Troubleshooting ODM ...................................................
PAGE 9
Contents Section 3 Reference ........................................................................ 265 Appendix A List of SF Oracle RAC health checks .............................. 267 LLT health checks ....................................................................... LMX health checks ...................................................................... I/O fencing health checks ............................................................. PrivNIC health checks ...............................
PAGE 10
14 Contents
PAGE 11
Section 1 SF Oracle RAC concepts and administration ■ Chapter 1. Overview of Veritas Storage Foundation for Oracle RAC ■ Chapter 2.
PAGE 12
16
PAGE 13
Chapter 1 Overview of Veritas Storage Foundation for Oracle RAC This chapter includes the following topics: ■ About Veritas Storage Foundation for Oracle RAC ■ How SF Oracle RAC works (high-level perspective) ■ Component products and processes of SF Oracle RAC ■ About preventing data corruption with I/O fencing ■ Periodic health evaluation of the clusters About Veritas Storage Foundation for Oracle RAC Veritas Storage Foundation™ for Oracle® RAC (SF Oracle RAC) leverages proprietary storage mana
PAGE 14
18 Overview of Veritas Storage Foundation for Oracle RAC About Veritas Storage Foundation for Oracle RAC ■ Support for file system-based management. SF Oracle RAC provides a generic clustered file system technology for storing and managing Oracle data files as well as other application data. ■ Support for high-availability of cluster interconnects.
PAGE 15
Overview of Veritas Storage Foundation for Oracle RAC How SF Oracle RAC works (high-level perspective) ■ Prevention of data corruption in split-brain scenarios with robust SCSI-3 Persistent Group Reservation (PGR) based I/O fencing or Coordination Point Server-based I/O fencing. The preferred fencing feature also enables you to specify how the fencing driver determines the surviving subcluster. ■ Support for sharing all types of files, in addition to Oracle database files, across nodes.
PAGE 16
20 Overview of Veritas Storage Foundation for Oracle RAC How SF Oracle RAC works (high-level perspective) At a conceptual level, SF Oracle RAC is a cluster that manages applications (instances), networking, and storage components using resources contained in service groups. SF Oracle RAC clusters have the following properties: ■ Each node runs its own operating system. ■ A cluster interconnect enables cluster communications. ■ A public network connects each node to a LAN for client access.
PAGE 17
Overview of Veritas Storage Foundation for Oracle RAC How SF Oracle RAC works (high-level perspective) The basic layout has the following characteristics: ■ Multiple client applications that access nodes in the cluster over a public network. ■ Nodes that are connected by at least two private network links (also called cluster interconnects) using 100BaseT or gigabit Ethernet controllers on each system. If the private links are on a single switch, isolate them using VLAN.
PAGE 18
22 Overview of Veritas Storage Foundation for Oracle RAC How SF Oracle RAC works (high-level perspective) SF Oracle RAC architecture Figure 1-2 OCI Client Node 1 Node 2 Oracle Clusterware/ Grid Infrastructure Oracle Clusterware/ Grid Infrastructure Db instance HAD HAD Db instance Cache Fusion traffic CFS CFS CVM Vxfen CVM VCSMM GAB VCSMM LMX LMX LLT Vxfen GAB LLT Databases containing: Control Files Datafiles Redo Log Files Temp Files SF Oracle RAC provides an environment that c
PAGE 19
Overview of Veritas Storage Foundation for Oracle RAC Component products and processes of SF Oracle RAC Component products and processes of SF Oracle RAC Table 1-1 lists the component products of SF Oracle RAC. Table 1-1 SF Oracle RAC component products Component product Description Cluster Volume Manager (CVM) Enables simultaneous access to shared volumes based on technology from Veritas Volume Manager (VxVM). See “Cluster Volume Manager (CVM)” on page 30.
PAGE 20
24 Overview of Veritas Storage Foundation for Oracle RAC Component products and processes of SF Oracle RAC Figure 1-3 Data stack Node 1 LGWR ARCH CKPT DBWR Data flow Node 2 LGWR ARCH CKPT DBWR Oracle RAC Instance Data flow ODM ODM CFS CFS CVM CVM Disk I/O Databases containing: Control Files Datafiles Oracle RAC Instance Disk I/O Redo Log Files Temp Files Communication requirements End-users on a client system are unaware that they are accessing a database hosted by multiple instances.
PAGE 21
Overview of Veritas Storage Foundation for Oracle RAC Component products and processes of SF Oracle RAC Communication stack Figure 1-4 Node 1 Node 2 VCS Cluster State Oracle RAC VCS Core ODM CFS Oracle RAC Cache Fusion/Lock Mgmt LLT/ GAB Data File Management File System MetaData LLT/ GAB ODM VCS Core CFS Volume Management CVM CVM Cluster interconnect communication channel The cluster interconnect provides an additional communication channel for all system-to-system communication, separate
PAGE 22
26 Overview of Veritas Storage Foundation for Oracle RAC Component products and processes of SF Oracle RAC traffic is redirected to the remaining links. A maximum of eight network links are supported. ■ Heartbeat LLT is responsible for sending and receiving heartbeat traffic over each configured network link. The heartbeat traffic is point to point unicast. LLT uses ethernet broadcast to learn the address of the nodes in the cluster.
PAGE 23
Overview of Veritas Storage Foundation for Oracle RAC Component products and processes of SF Oracle RAC If there is a failure of all configured high priority links, LLT will switch all cluster communications traffic to the first available low priority link. Communication traffic will revert back to the high priority links as soon as they become available.
PAGE 24
28 Overview of Veritas Storage Foundation for Oracle RAC Component products and processes of SF Oracle RAC Group Membership Services/Atomic Broadcast The GAB protocol is responsible for cluster membership and cluster communications. Figure 1-6 shows the cluster communication using GAB messaging.
PAGE 25
Overview of Veritas Storage Foundation for Oracle RAC Component products and processes of SF Oracle RAC For example, CVM must initiate volume recovery and CFS must perform a fast parallel file system check. When systems start receiving heartbeats from a peer outside of the current membership, a protocol enables the peer to join the membership. ■ ■ Cluster communications GAB provides reliable cluster communication between SF Oracle RAC modules.
PAGE 26
30 Overview of Veritas Storage Foundation for Oracle RAC Component products and processes of SF Oracle RAC Cluster Volume Manager (CVM) CVM is an extension of Veritas Volume Manager, the industry-standard storage virtualization platform. CVM extends the concepts of VxVM across multiple nodes. Each node recognizes the same logical volume layout, and more importantly, the same state of all volume resources.
PAGE 27
Overview of Veritas Storage Foundation for Oracle RAC Component products and processes of SF Oracle RAC contact with a specific disk, CVM excludes the node from participating in the use of that disk. CVM communication CVM communication involves various GAB ports for different types of communication. For an illustration of these ports: CVM communication involves the following GAB ports: ■ Port w Most CVM communication uses port w for vxconfigd communications.
PAGE 28
32 Overview of Veritas Storage Foundation for Oracle RAC Component products and processes of SF Oracle RAC For database files, when ODM is enabled with SmartSync option, Oracle Resilvering handles recovery of mirrored volumes. For non-database files, this recovery is optimized using Dirty Region Logging (DRL). The DRL is a map stored in a special purpose VxVM sub-disk and attached as an additional plex to the mirrored volume.
PAGE 29
Overview of Veritas Storage Foundation for Oracle RAC Component products and processes of SF Oracle RAC metadata, such as inodes and free lists. The role of GLM is set on a per-file system basis to enable load balancing. CFS involves a primary/secondary architecture. One of the nodes in the cluster is the primary node for a file system. Though any node can initiate an operation to create, delete, or resize data, the GLM master node carries out the actual operation.
PAGE 30
34 Overview of Veritas Storage Foundation for Oracle RAC Component products and processes of SF Oracle RAC CFS recovery The vxfsckd daemon is responsible for ensuring file system consistency when a node crashes that was a primary node for a shared file system.
PAGE 31
Overview of Veritas Storage Foundation for Oracle RAC Component products and processes of SF Oracle RAC node. For example, before creating a new data file with a specific name, ODM checks with other nodes to see if the file name is already in use. Veritas ODM performance enhancements Veritas ODM enables the following performance benefits provided by Oracle Disk Manager: ■ Locking for data integrity. ■ Few system calls and context switches. ■ Increased I/O parallelism.
PAGE 32
36 Overview of Veritas Storage Foundation for Oracle RAC Component products and processes of SF Oracle RAC status by communicating over GAB and LLT. HAD manages all application services using agents, which are installed programs to manage resources (specific hardware or software entities). The VCS architecture is modular for extensibility and efficiency. HAD does not need to know how to start up Oracle or any other application under VCS control.
PAGE 33
Overview of Veritas Storage Foundation for Oracle RAC Component products and processes of SF Oracle RAC the mount point or not. In order to determine this, the monitor function does operations such as mount table scans or runs statfs equivalents. With intelligent monitoring framework (IMF), VCS supports intelligent resource monitoring in addition to the poll-based monitoring. Process-based and Mount-based agents are IMF-aware. IMF is an extension to the VCS agent framework.
PAGE 34
38 Overview of Veritas Storage Foundation for Oracle RAC Component products and processes of SF Oracle RAC See the Veritas Cluster Server Agent for Oracle Installation and Configuration Guide for IMF-aware agents for Oracle. See the Veritas Storage Foundation Cluster File System Installation Guide for IMF-aware agents in CFS environments. How intelligent resource monitoring works When an IMF-enabled agent starts up, the agent initializes the IMF notification module.
PAGE 35
Overview of Veritas Storage Foundation for Oracle RAC Component products and processes of SF Oracle RAC Oracle Clusterware/Grid Infrastructure Oracle Clusterware/Grid Infrastructure manages Oracle cluster-related functions including membership, group services, global resource management, and databases. Oracle Clusterware/Grid Infrastructure is required for every Oracle RAC instance.
PAGE 36
40 Overview of Veritas Storage Foundation for Oracle RAC Component products and processes of SF Oracle RAC are brought online by VCS before Oracle Clusterware/Grid Infrastructure starts. This prevents the premature startup of Oracle Clusterware/Grid Infrastructure, which causes cluster failures. Oracle Cluster Registry The Oracle Cluster Registry (OCR) contains cluster and database configuration and state information for Oracle RAC and Oracle Clusterware/Grid Infrastructure.
PAGE 37
Overview of Veritas Storage Foundation for Oracle RAC Component products and processes of SF Oracle RAC Unlike VCS, Oracle Clusterware/Grid Infrastructure uses separate resources for components that run in parallel on multiple nodes. Resource profiles Resources are defined by profiles, which are similar to the attributes that define VCS resources. The OCR contains application resource profiles, dependencies, and status information.
PAGE 38
42 Overview of Veritas Storage Foundation for Oracle RAC Component products and processes of SF Oracle RAC to make ioctl calls to VCSMM, which in turn obtains membership information for clusters and instances by communicating with GAB on port o. Veritas Cluster Server inter-process communication To coordinate access to a single database by multiple instances, Oracle uses extensive communications between nodes and instances.
PAGE 39
Overview of Veritas Storage Foundation for Oracle RAC About preventing data corruption with I/O fencing About preventing data corruption with I/O fencing I/O fencing is a feature that prevents data corruption in the event of a communication breakdown in a cluster. To provide high availability, the cluster must be capable of taking corrective action when a node fails. In this situation, SF Oracle RAC configures its components to reflect the altered membership.
PAGE 40
44 Overview of Veritas Storage Foundation for Oracle RAC About preventing data corruption with I/O fencing Figure 1-8 Private network disruption and I/O fencing solution Public Network Data Corruption Private Network (Heartbeats) Node A Order Entry Node B Order Entry Node C Order Entry Node D Order Entry Disk I/O Disk I/O I/O fencing solution Disk Array About SCSI-3 Persistent Reservations SCSI-3 Persistent Reservations (SCSI-3 PR) are required for I/O fencing and resolve the issues of using SC
PAGE 41
Overview of Veritas Storage Foundation for Oracle RAC About preventing data corruption with I/O fencing to the device. A single preempt and abort command ejects a node from all paths to the storage device. About I/O fencing operations I/O fencing, provided by the kernel-based fencing module (vxfen), performs identically on node failures and communications failures.
PAGE 42
46 Overview of Veritas Storage Foundation for Oracle RAC About preventing data corruption with I/O fencing About data disks Data disks are standard disk devices for data storage and are either physical disks or RAID Logical Units (LUNs). These disks must support SCSI-3 PR and must be part of standard VxVM or CVM disk groups. CVM is responsible for fencing data disks on a disk group basis. Disks that are added to a disk group and new paths that are discovered for a device are automatically fenced.
PAGE 43
Overview of Veritas Storage Foundation for Oracle RAC About preventing data corruption with I/O fencing The coordination point server (CP server) is a software solution which runs on a remote system or cluster.
PAGE 44
48 Overview of Veritas Storage Foundation for Oracle RAC About preventing data corruption with I/O fencing How preferred fencing works The I/O fencing driver uses coordination points to prevent split-brain in a VCS cluster. At the time of a network partition, the fencing driver in each subcluster races for the coordination points. The subcluster that grabs the majority of coordination points survives whereas the fencing driver causes a system panic on nodes from all other subclusters.
PAGE 45
Overview of Veritas Storage Foundation for Oracle RAC About preventing data corruption with I/O fencing When the PreferredFencingPolicy attribute value is set as Group, VCS calculates node weight based on the group-level attribute Priority for those service groups that are active. See the Veritas Cluster Server Administrator's Guide for more information on the VCS attributes. See “Enabling or disabling the preferred fencing policy” on page 131.
PAGE 46
50 Overview of Veritas Storage Foundation for Oracle RAC About preventing data corruption with I/O fencing Table 1-2 I/O fencing scenarios (continued) Event Node A: What happens? Node B: What happens? Node A hangs. Node A is extremely busy for some reason or is in the kernel debugger. Node B loses Verify private heartbeats with Node networks function A, and races for a and restart Node A. majority of coordinator disks.
PAGE 47
Overview of Veritas Storage Foundation for Oracle RAC About preventing data corruption with I/O fencing Table 1-2 I/O fencing scenarios (continued) Event Node A: What happens? Node B: What happens? Operator action Nodes A and B and private networks lose power. Coordinator and data disks retain power. Node A restarts and I/O fencing driver (vxfen) detects Node B is registered with coordinator disks. The driver does not see Node B listed as member of cluster because private networks are down.
PAGE 48
52 Overview of Veritas Storage Foundation for Oracle RAC About preventing data corruption with I/O fencing Table 1-2 I/O fencing scenarios (continued) Event Node A: What happens? Node A crashes while Node A is crashed. Node B is down. Node B comes up and Node A is still down. Node B: What happens? Operator action Node B restarts and detects Node A is registered with the coordinator disks. The driver does not see Node A listed as member of the cluster.
PAGE 49
Overview of Veritas Storage Foundation for Oracle RAC About preventing data corruption with I/O fencing Table 1-2 I/O fencing scenarios (continued) Event Node A: What happens? The disk array Node A continues to containing two of the operate in the three coordinator cluster. disks is powered off. Node B: What happens? Operator action Node B has left the cluster. Power on the failed disk array so that subsequent network partition does not cause cluster shutdown, or replace coordinator disks.
PAGE 50
54 Overview of Veritas Storage Foundation for Oracle RAC About preventing data corruption with I/O fencing Typical SF Oracle RAC cluster configuration with server-based I/O fencing Figure 1-9 displays a configuration using a SF Oracle RAC cluster (with two nodes), a single CP server, and two coordinator disks. The nodes within the SF Oracle RAC cluster are connected to and communicate with each other using LLT links.
PAGE 51
Overview of Veritas Storage Foundation for Oracle RAC About preventing data corruption with I/O fencing See Figure 1-10 on page 55. In such a configuration, if the site with two coordinator disks is inaccessible, the other site does not survive due to a lack of a majority of coordination points. I/O fencing would require extension of the SAN to the third site which may not be a suitable solution. An alternative is to place a CP server at a remote site as the third coordination point.
PAGE 52
56 Overview of Veritas Storage Foundation for Oracle RAC About preventing data corruption with I/O fencing Note: Symantec does not support a configuration where multiple CP servers are configured on the same machine. Deployment and migration scenarios for CP server Table 1-3 describes the supported deployment and migration scenarios, and the procedures you must perform on the SF Oracle RAC cluster and the CP server.
PAGE 53
Overview of Veritas Storage Foundation for Oracle RAC About preventing data corruption with I/O fencing Table 1-3 Scenario CP server Replace the Operational CP coordination point server from an existing CP server to an operational CP server coordination point CP server deployment and migration scenarios (continued) SF Oracle RAC Action required cluster Existing SF Oracle RAC cluster using CP server as coordination point On the designated CP server, prepare to configure the new CP server manually.
PAGE 54
58 Overview of Veritas Storage Foundation for Oracle RAC About preventing data corruption with I/O fencing Table 1-3 Scenario CP server Enabling fencing Operational CP in a SF Oracle RAC server cluster with an operational CP server coordination point CP server deployment and migration scenarios (continued) SF Oracle RAC Action required cluster Existing SF Oracle RAC cluster with fencing configured in disabled mode Note: Migrating from fencing in disabled mode to customized mode incurs application down
PAGE 55
Overview of Veritas Storage Foundation for Oracle RAC About preventing data corruption with I/O fencing Table 1-3 Scenario CP server Enabling fencing New CP server in a SF Oracle RAC cluster with a new CP server coordination point CP server deployment and migration scenarios (continued) SF Oracle RAC Action required cluster Existing SF Oracle RAC cluster with fencing configured in scsi3 mode On the designated CP server, perform the following tasks: 1 Prepare to configure the new CP server.
PAGE 56
60 Overview of Veritas Storage Foundation for Oracle RAC About preventing data corruption with I/O fencing Table 1-3 Scenario CP server Enabling fencing Operational CP in a SF Oracle RAC server cluster with an operational CP server coordination point CP server deployment and migration scenarios (continued) SF Oracle RAC Action required cluster Existing SF Oracle RAC cluster with fencing configured in disabled mode On the designated CP server, prepare to configure the new CP server.
PAGE 57
Overview of Veritas Storage Foundation for Oracle RAC About preventing data corruption with I/O fencing Table 1-3 Scenario CP server Refreshing Operational CP registrations of SF server Oracle RAC cluster nodes on coordination points (CP servers/ coordinator disks) without incurring application downtime CP server deployment and migration scenarios (continued) SF Oracle RAC Action required cluster Existing SF Oracle RAC cluster using the CP server as coordination point On the SF Oracle RAC cluster run t
PAGE 58
62 Overview of Veritas Storage Foundation for Oracle RAC About preventing data corruption with I/O fencing To migrate from disk-based fencing to server-based fencing 1 Make sure system-to-system communication is functioning properly. 2 Make sure that the SF Oracle RAC cluster is online and uses disk-based fencing.
PAGE 59
Overview of Veritas Storage Foundation for Oracle RAC About preventing data corruption with I/O fencing The vxfenswap utility rolls back the migration operation. ■ If you want to commit the new fencing configuration changes, answer y at the prompt. Do you wish to commit this change? [y/n] (default: n) y If the utility successfully commits, the utility moves the /etc/vxfenmode.test file to the /etc/vxfenmode file. 7 After the migration is complete, verify the change in the fencing mode.
PAGE 60
64 Overview of Veritas Storage Foundation for Oracle RAC About preventing data corruption with I/O fencing See “Migrating from disk-based to server-based fencing in an online cluster” on page 61. To migrate from server-based fencing to disk-based fencing 1 Make sure system-to-system communication is functioning properly. 2 Make sure that the SF Oracle RAC cluster is online.
PAGE 61
Overview of Veritas Storage Foundation for Oracle RAC About preventing data corruption with I/O fencing ■ If you do not want to commit the new fencing configuration changes, press Enter or answer n at the prompt. Do you wish to commit this change? [y/n] (default: n) n The vxfenswap utility rolls back the migration operation. ■ If you want to commit the new fencing configuration changes, answer y at the prompt.
PAGE 62
66 Overview of Veritas Storage Foundation for Oracle RAC About preventing data corruption with I/O fencing In both the configurations, VCS provides local start and stop of the CP server process, taking care of dependencies such as NIC, IP address, etc.. Moreover, VCS also serves to restart the CP server process in case the process faults. To make the CP server process highly available, you must perform the following tasks: ■ Install and configure SFHA on the CP server systems.
PAGE 63
Overview of Veritas Storage Foundation for Oracle RAC About preventing data corruption with I/O fencing CP server and SF Oracle RAC clusters with authentication broker and root broker Figure 1-11 Server Authentication broker Root broker Client1 Authentication broker Client2 Authenticaton broker Client clusters ... Entities on behalf of which authentication is done, are referred to as principals.
PAGE 64
68 Overview of Veritas Storage Foundation for Oracle RAC About preventing data corruption with I/O fencing Figure 1-12 CP server (vxcpserv) End-To-end communication flow with security enabled on CP server and SF Oracle RAC clusters Authentication broker Root broker CP client (cpsadm) Authentication broker Client cluster nodes Communication flow between CP server and SF Oracle RAC cluster nodes with security configured on them is as follows: ■ Initial setup: Identities of authentication brokers co
PAGE 65
Overview of Veritas Storage Foundation for Oracle RAC About preventing data corruption with I/O fencing The cp server process (vxcpserv) uses its own user (_CPS_SERVER_) which is added to the local authentication broker during server startup. ■ Getting credentials from authentication broker: The cpsadm command tries to get the existing credentials from authentication broker running on the local node. If this fails, it tries to authenticate itself with the local authentication broker.
PAGE 66
70 Overview of Veritas Storage Foundation for Oracle RAC About preventing data corruption with I/O fencing Note: The CP server configuration file (/etc/vxcps.conf) must not contain a line specifying security=0. If there is no line specifying "security" parameter or if there is a line specifying security=1, CP server with security is enabled (which is the default).
PAGE 67
Overview of Veritas Storage Foundation for Oracle RAC Periodic health evaluation of the clusters Note: The configuration file (/etc/vxfenmode) on each client node must not contain a line specifying security=0. If there is no line specifying "security" parameter or if there is a line specifying security=1, client node starts with security enabled (which is the default). Settings in non-secure mode In non-secure mode, only authorization is provided on the CP server. Passwords are not requested.
PAGE 68
72 Overview of Veritas Storage Foundation for Oracle RAC Periodic health evaluation of the clusters The health check utility is installed at /opt/VRTSvcs/rac/healthcheck/healthcheck during the installation of SF Oracle RAC. The utility determines the health of the components by gathering information from configuration files or by reading the threshold values set in the health check configuration file /opt/VRTSvcs/rac/healthcheck/healthcheck.cf.
PAGE 69
Chapter 2 Administering SF Oracle RAC and its components This chapter includes the following topics: ■ Administering SF Oracle RAC ■ Administering VCS ■ Administering I/O fencing ■ Administering the CP server ■ Administering CFS ■ Administering CVM ■ Administering SF Oracle RAC global clusters Administering SF Oracle RAC This section provides instructions for the following SF Oracle RAC administration tasks: ■ Setting the environment variables See “Setting the environment variables” on page
PAGE 70
74 Administering SF Oracle RAC and its components Administering SF Oracle RAC See “Installing Veritas Volume Manager, Veritas File System, or ODM patches on SF Oracle RAC nodes” on page 80. ■ Applying operating system updates See “Applying operating system updates” on page 81. ■ Determining the LMX traffic for each database See “Determining the LMX traffic for each database” on page 82. ■ Adding storage to an SF Oracle RAC cluster See “Adding storage to an SF Oracle RAC cluster” on page 84.
PAGE 71
Administering SF Oracle RAC and its components Administering SF Oracle RAC Set the PATH environment variable in the .profile file (or other appropriate shell setup file for your system) on each system to include installation and other commands. Note: Do not define $ORACLE_HOME/lib in LIBPATH for root user. You should define $ORACLE_HOME/lib in LIBPATH for the oracle user.
PAGE 72
76 Administering SF Oracle RAC and its components Administering SF Oracle RAC To start SF Oracle RAC using the SF Oracle RAC installer 1 Log into one of the nodes in the cluster as the root user. 2 Start SF Oracle RAC: # /opt/VRTS/install/installsfrac -start galaxy nebula Starting SF Oracle RAC manually on each node Perform the steps in the following procedures to start SF Oracle RAC manually on each node. To start SF Oracle RAC manually on each node 1 Log into each node as the root user.
PAGE 73
Administering SF Oracle RAC and its components Administering SF Oracle RAC 8 Start VCS, CVM, and CFS: # hastart 9 Verify that all GAB ports are up and running: # gabconfig -a GAB Port Memberships =============================================================== Port a gen 3c4c05 membership 01 Port b gen 3c4c0b membership 01 Port d gen 3c4c06 membership 01 Port f gen 3c4c15 membership 01 Port h gen 3c4c19 membership 01 Port o gen 3c4c0d membership 01 Port u gen 3c4c17 membership 01 Port v gen 3c4c11 member
PAGE 74
78 Administering SF Oracle RAC and its components Administering SF Oracle RAC To stop SF Oracle RAC manually on each node 1 Stop the Oracle database.
PAGE 75
Administering SF Oracle RAC and its components Administering SF Oracle RAC 5 Stop VCS, CVM and CFS: # hastop -local Verify that the ports 'f', 'u', 'v', 'w' and 'h' are closed: # gabconfig -a GAB Port Memberships =============================================================== Port a gen 761f03 membership 01 Port b gen 761f08 membership 01 Port d gen 761f02 membership 01 Port o gen 761f01 membership 01 6 Stop ODM: # /sbin/init.
PAGE 76
80 Administering SF Oracle RAC and its components Administering SF Oracle RAC To apply Oracle patches 1 Stop the Oracle database. If the database instances are not managed by VCS, run the following on one of the nodes in the cluster: $ srvctl stop database -d db_name If the database instances are managed by VCS, take the corresponding VCS service groups offline. As superuser, enter: # hagrp -offline group_name -any 2 Install the patches or patchsets required for your Oracle RAC installation.
PAGE 77
Administering SF Oracle RAC and its components Administering SF Oracle RAC 3 Install the VxVM, VxFS, or ODM patch: # cd patch_dir where patch_dir is the directory that contains the VxVM or VxFS patch you want to install. # swinstall -x mount_all_filesystems=false -x \ autoreboot=true -s `pwd` patch_list 4 Restart the nodes: # shutdown -r now The nodes form a cluster after restarting and the applications that are managed by VCS come online.
PAGE 78
82 Administering SF Oracle RAC and its components Administering SF Oracle RAC 5 Reboot the node: # shutdown -r now 6 Repeat all the steps on each node in the cluster. Determining the LMX traffic for each database Use the lmxdbstat utility to determine the LMX bandwidth used for database traffic for each database. The utility is located at /sbin/lmxdbstat.
PAGE 79
Administering SF Oracle RAC and its components Administering SF Oracle RAC -p pid Displays the statistics for a database process. You need to specify the process ID of the process -d db1 [db2] Displays the statistics for a database. Specify more than one database when you want to compare database traffic between multiple databases. interval Indicates the period of time in seconds over which LMX statistics is gathered for a database . The default value is 0.
PAGE 80
84 Administering SF Oracle RAC and its components Administering SF Oracle RAC Table 2-1 Using lmxdbstat utility to view LMX traffic for databases (continued) Scenario Command To collect the statistics # lmxdbstat interval count for a particular interval For example, to gather LMX statistics for all databases, 3 times, or frequency for a particular database or all each for an interval of 10 seconds: databases # lmxdbstat 10 3 Adding storage to an SF Oracle RAC cluster You can add storage to an SF Orac
PAGE 81
Administering SF Oracle RAC and its components Administering SF Oracle RAC To extend the volume space on a disk group 1 Determine the length by which you can increase an existing volume. # vxassist [-g diskgroup] maxgrow volume_name For example, to determine the maximum size the volume oradatavol in the disk group oradatadg can grow, given its attributes and free storage available: # vxassist -g oradatadg maxgrow oradatavol 2 Extend the volume, as required.
PAGE 82
86 Administering SF Oracle RAC and its components Administering SF Oracle RAC For information on various failure and recovery scenarios, see the Veritas Volume Manager Troubleshooting Guide. Enhancing the performance of SF Oracle RAC clusters The main components of clustering that impact the performance of an SF Oracle RAC cluster are: ■ Kernel components, specifically LLT and GAB ■ VCS engine (had) ■ VCS agents Each VCS agent process has two components—the agent framework and the agent functions.
PAGE 83
Administering SF Oracle RAC and its components Administering SF Oracle RAC data. If required, you can offload processing of the point-in-time copies onto another host to avoid contention for system resources on your production server. For instructions on creating snapshots for offhost processing, see the Veritas Storage Foundation: Storage and Availability Management for Oracle Databases guide.
PAGE 84
88 Administering SF Oracle RAC and its components Administering SF Oracle RAC # cd /opt/VRTSvcs/rac/healthcheck/ # ./healthcheck If you want to schedule periodic health checks for your cluster, create a cron job that runs on each node in the cluster. Redirect the health check report to a file. Verifying the ODM port It is recommended to enable ODM in SF Oracle RAC.
PAGE 85
Administering SF Oracle RAC and its components Administering SF Oracle RAC Table 2-2 Options for verifying the nodes in a cluster (continued) Type of check Description SF Oracle RAC health checks SF Oracle RAC provides a health check utility that examines the functional health of the components in an SF Oracle RAC cluster. The utility when invoked gathers real-time operational information on cluster components and displays the report on your system console.
PAGE 86
90 Administering SF Oracle RAC and its components Administering SF Oracle RAC ■ Verifies that the Oracle RAC database version is the same on all nodes in the cluster. The check fails if the version or patchset information varies across nodes. For information on resolving issues that you may encounter during the checks: See “Troubleshooting installation and configuration check failures” on page 172.
PAGE 87
Administering SF Oracle RAC and its components Administering VCS 5 Enter the full path of the Oracle RAC database home directory. Note: The installer tries to discover the location of the Oracle RAC database home directory from the Oracle inventory. If the installer discovers the information, you are not prompted for this information. The installer verifies the path information. 6 Enter the Oracle user name.
PAGE 88
92 Administering SF Oracle RAC and its components Administering VCS ■ Loading Veritas drivers into memory See “Loading Veritas drivers into memory” on page 93. ■ Verifying VCS configuration See “Verifying VCS configuration” on page 93. ■ Starting and stopping VCS See “Starting and stopping VCS” on page 93. ■ Environment variables to start and stop VCS modules See “Environment variables to start and stop VCS modules” on page 93.
PAGE 89
Administering SF Oracle RAC and its components Administering VCS vxdmp vxfen vxfs vxfs50 vxglm vxgms vxportal vxportal50 static loaded unused static loaded loaded unused static best explicit best explicit explicit best auto-loadable, unloadable loadable, unloadable auto-loadable, unloadable auto-loadable, unloadable auto-loadable, unloadable loadable, unloadable Loading Veritas drivers into memory Under normal operational conditions, you do not need to load Veritas drivers into memory.
PAGE 90
94 Administering SF Oracle RAC and its components Administering VCS Note: The startup and shutdown of AMF, LLT, GAB, VxFEN, and VCS engine are inter-dependent. For a clean startup or shutdown of VCS, you must either enable or disable the startup and shutdown modes for all these modules. Table 2-3 describes the start and stop variables for VCS. Table 2-3 Start and stop environment variables for VCS Environment variable Definition and default value AMF_START Startup mode for the AMF driver.
PAGE 91
Administering SF Oracle RAC and its components Administering VCS Table 2-3 Start and stop environment variables for VCS (continued) Environment variable Definition and default value GAB_STOP Shutdown mode for GAB. By default, GAB is enabled to stop during a system shutdown. This environment variable is defined in the following file: /etc/rc.config.d/gabconf Default: 1 VXFEN_START Startup mode for VxFEN. By default, VxFEN is enabled to start up after a system reboot.
PAGE 92
96 Administering SF Oracle RAC and its components Administering VCS is not coordinated correctly. For example, if only the Oracle Clusterware links are down, Oracle Clusterware kills one set of nodes after the expiry of the css-misscount interval and initiates the Oracle Clusterware and database recovery, even before CVM and CFS detect the node failures. This uncoordinated recovery may cause data corruption.
PAGE 93
Administering SF Oracle RAC and its components Administering VCS Where: devtag Tag to identify the link device Network device path of the interface For link type ether, the path is followed by a colon (:) and an integer which specifies the unit or PPA used by LLT to attach. For link types udp and udp6, the device is the udp and udp6 device path respectively.
PAGE 94
98 Administering SF Oracle RAC and its components Administering VCS # lltconfig -t link1 -d udp6 -b udp6 -I 2000::1 2 If you want to configure the link under PrivNIC or MultiPrivNIC as a failover target in the case of link failure, modify the PrivNIC or MultiPrivNIC configuration as follows: # haconf -makerw # hares -modify resource_name Device device device_id [-sys hostname] # haconf -dump -makero The following is an example of configuring the link under PrivNIC.
PAGE 95
Administering SF Oracle RAC and its components Administering VCS To remove an LLT link 1 Run the following command to remove a network link that is configured under LLT: # lltconfig -u devtag 2 If the link you removed is configured as a PrivNIC or MultiPrivNIC resource, you also need to modify the resource configuration after removing the link.
PAGE 96
100 Administering SF Oracle RAC and its components Administering VCS To configure aggregated interfaces under LLT by editing the /etc/llttab file 1 If LLT is running, run the following command to stop LLT: 2 Add the following entry to the /etc/llttab file to configure an aggregated interface.
PAGE 97
Administering SF Oracle RAC and its components Administering VCS To configure aggregated interfaces under LLT using the lltconfig command ◆ When LLT is running, use the following command to configure an aggregated interface: lltconfig -t devtag -d device [-b linktype ] [-s SAP] [-m mtu] devtag Tag to identify the link device Network device path of the aggregated interface The path is followed by a colon (:) and an integer which specifies the unit or PPA used by LLT to attach.
PAGE 98
102 Administering SF Oracle RAC and its components Administering VCS To display the cluster details and LLT version for LLT links ◆ Run the following command to display the details: # /opt/VRTSllt/lltdump -D -f link For example, if lan3 is connected to galaxy, then the command displays a list of all cluster IDs and node IDs present on the network link lan3.
PAGE 99
Administering SF Oracle RAC and its components Administering VCS default. The IMF resource type attribute determines whether an IMF-aware agent must perform intelligent resource monitoring. See “About resource monitoring” on page 36. To enable intelligent resource monitoring 1 Make the VCS configuration writable. # haconf -makerw 2 Run the following command to enable intelligent resource monitoring.
PAGE 100
104 Administering SF Oracle RAC and its components Administering VCS 5 Make sure that the AMF kernel driver is configured on all nodes in the cluster. /sbin/init.d/amf status Configure the AMF driver if the command output returns that the AMF driver is not loaded or not configured. See “Administering the AMF kernel driver” on page 104. 6 Restart the agent. Run the following commands on each node.
PAGE 101
Administering SF Oracle RAC and its components Administering I/O fencing To start the AMF kernel driver 1 Set the value of the AMF_START variable to 1 in the following file: /etc/rc.config.d/amf 2 Start the AMF kernel driver. Run the following command: /sbin/init.d/amf start To stop the AMF kernel driver 1 Stop the AMF kernel driver. Run the following command: /sbin/init.d/amf stop 2 Set the value of the AMF_START variable to 0 in the following file: /etc/rc.config.
PAGE 102
106 Administering SF Oracle RAC and its components Administering I/O fencing About administering I/O fencing The I/O fencing feature provides the following utilities that are available through the VRTSvxfen depot: vxfentsthdw Tests hardware for I/O fencing See “About the vxfentsthdw utility” on page 106. vxfenconfig Configures and unconfigures I/O fencing Checks the list of coordinator disks used by the vxfen driver.
PAGE 103
Administering SF Oracle RAC and its components Administering I/O fencing Refer also to the vxfentsthdw(1M) manual page. About general guidelines for using vxfentsthdw utility Review the following guidelines to use the vxfentsthdw utility: ■ The utility requires two systems connected to the shared storage. Caution: The tests overwrite and destroy data on the disks, unless you use the -r option. ■ The two nodes must have ssh (default) or rsh communication.
PAGE 104
108 Administering SF Oracle RAC and its components Administering I/O fencing Table 2-4 vxfentsthdw options (continued) vxfentsthdw option Description When to use -r Non-destructive testing. Testing of the disks for SCSI-3 persistent reservations occurs in a non-destructive way; that is, there is only testing for reads, not writes. May be used with -m, -f, or -g options. Use during non-destructive testing. -t Testing of the return value of SCSI TEST UNIT (TUR) command under SCSI-3 reservations.
PAGE 105
Administering SF Oracle RAC and its components Administering I/O fencing Table 2-4 vxfentsthdw options (continued) vxfentsthdw option Description When to use -f filename Utility tests system/device For testing several disks. combinations listed in a text file. See “Testing the shared disks May be used with -r and -t listed in a file using the options. vxfentsthdw -f option” on page 113. -g disk_group Utility tests all disk devices in a For testing many disks and specified disk group.
PAGE 106
110 Administering SF Oracle RAC and its components Administering I/O fencing To test the coordinator disk group using vxfentsthdw -c 1 Use the vxfentsthdw command with the -c option. For example: # vxfentsthdw -c vxfencoorddg 2 Enter the nodes you are using to test the coordinator disks: Enter the first node of the cluster: galaxy Enter the second node of the cluster: nebula 3 Review the output of the testing process for both nodes for all disks in the coordinator disk group.
PAGE 107
Administering SF Oracle RAC and its components Administering I/O fencing To remove and replace a failed disk 1 Use the vxdiskadm utility to remove the failed disk from the disk group. Refer to the Veritas Volume Manager Administrator’s Guide. 2 Add a new disk to the node, initialize it, and add it to the coordinator disk group.
PAGE 108
112 Administering SF Oracle RAC and its components Administering I/O fencing If the utility does not show a message stating a disk is ready, verification has failed. Failure of verification can be the result of an improperly configured disk array. It can also be caused by a bad disk. If the failure is due to a bad disk, remove and replace it.
PAGE 109
Administering SF Oracle RAC and its components Administering I/O fencing 6 If a disk is ready for I/O fencing on each node, the utility reports success: ALL tests on the disk /dev/vx/rdmp/c1t1d0 have PASSED The disk is now ready to be configured for I/O Fencing on node galaxy ... Removing test keys and temporary files, if any ... . . 7 Run the vxfentsthdw utility for each disk you intend to verify.
PAGE 110
114 Administering SF Oracle RAC and its components Administering I/O fencing After testing, destroy the disk group and put the disks into disk groups as you need. To test all the disks in a diskgroup 1 Create a diskgroup for the disks that you want to test. 2 Enter the following command to test the diskgroup test_disks_dg: # vxfentsthdw -g test_disks_dg The utility reports the test results one disk at a time.
PAGE 111
Administering SF Oracle RAC and its components Administering I/O fencing -m register with disks -n make a reservation with disks -p remove registrations made by other systems -r read reservations -x remove registrations Refer to the vxfenadm(1m) manual page for a complete list of the command options. About the I/O fencing registration key format The keys that the vxfen driver registers on the data disks and the coordinator disks consist of eight bytes.
PAGE 112
116 Administering SF Oracle RAC and its components Administering I/O fencing Byte 0 1 2 3 4 5 6 7 Value A+nID P G R DGcount DGcount DGcount DGcount where DGcount is the count of disk group in the configuration Displaying the I/O fencing registration keys You can display the keys that are currently assigned to the disks using the vxfenadm command. The variables such as disk_7, disk_8, and disk_9 in the following procedure represent the disk names in your setup.
PAGE 113
Administering SF Oracle RAC and its components Administering I/O fencing 117 Device Name: /dev/vx/rdmp/disk_7 Total Number Of Keys: 1 key[0]: [Numeric Format]: 66,80,71,82,48,48,48,48 [Character Format]: BPGR0001 [Node Format]: Cluster ID: unknown Node ID: 1 Node Name: nebula ■ To display the keys on a VCS failover disk group: # vxfenadm -s /dev/vx/rdmp/disk_8 Reading SCSI Registration Keys...
PAGE 114
118 Administering SF Oracle RAC and its components Administering I/O fencing # lltstat -C 57069 If the disk has keys which do not belong to a specific cluster, then the vxfenadm command cannot look up the node name for the node ID and hence prints the node name as unknown.
PAGE 115
Administering SF Oracle RAC and its components Administering I/O fencing To verify that the nodes see the same disks 1 Verify the connection of the shared storage for data to two of the nodes on which you installed SF Oracle RAC.
PAGE 116
120 Administering SF Oracle RAC and its components Administering I/O fencing You can also use this procedure to remove the registration and reservation keys created by another node from a disk. To clear keys after split-brain 1 Stop VCS on all nodes. # hastop -all 2 Make sure that the port h is closed on all the nodes. Run the following command on each node to verify that the port h is closed: # gabconfig -a Port h must not appear in the output. 3 Stop I/O fencing on all nodes.
PAGE 117
Administering SF Oracle RAC and its components Administering I/O fencing 6 Read the script’s introduction and warning. Then, you can choose to let the script run. Do you still want to continue: [y/n] (default : n) y In some cases, informational messages resembling the following may appear on the console of one of the nodes in the cluster when a node is ejected from a disk/LUN. You can ignore these informational messages.
PAGE 118
122 Administering SF Oracle RAC and its components Administering I/O fencing About the vxfenswap utility The vxfenswap utility allows you to replace coordinator disks in a cluster that is online. The utility verifies that the serial number of the new disks are identical on all the nodes and the new disks can support I/O fencing. This utility also supports server-based fencing. Refer to the vxfenswap(1M) manual page.
PAGE 119
Administering SF Oracle RAC and its components Administering I/O fencing You can also use the vxfenswap utility to migrate between the disk-based and the server-based fencing without incurring application downtime in the SF Oracle RAC cluster. See “Migrating from disk-based to server-based fencing in an online cluster” on page 61. See “Migrating from server-based to disk-based fencing in an online cluster” on page 63.
PAGE 120
124 Administering SF Oracle RAC and its components Administering I/O fencing To replace a disk in a coordinator diskgroup when the cluster is online 1 Make sure system-to-system communication is functioning properly. 2 Make sure that the cluster is online.
PAGE 121
Administering SF Oracle RAC and its components Administering I/O fencing ■ Initialize the new disks as VxVM disks. ■ Check the disks for I/O fencing compliance. ■ Add the new disks to the coordinator disk group and set the coordinator attribute value as "on" for the coordinator disk group. See the Veritas Storage Foundation for Oracle RAC Installation and Configuration Guide for detailed instructions. Note that though the disk group content changes, the I/O fencing remains in the same state.
PAGE 122
126 Administering SF Oracle RAC and its components Administering I/O fencing To replace the coordinator diskgroup 1 Make sure system-to-system communication is functioning properly. 2 Make sure that the cluster is online.
PAGE 123
Administering SF Oracle RAC and its components Administering I/O fencing 5 Validate the new disk group for I/O fencing compliance. Run the following command: # vxfentsthdw -c vxfendg See “Testing the coordinator disk group using vxfentsthdw -c option” on page 109. 6 If the new disk group is not already deported, run the following command to deport the disk group: # vxdg deport vxfendg 7 Make sure that the /etc/vxfenmode file is updated to specify the correct disk policy.
PAGE 124
128 Administering SF Oracle RAC and its components Administering I/O fencing 11 Set the coordinator attribute value as "on" for the new coordinator disk group. # vxdg -g vxfendg set coordinator=on Set the coordinator attribute value as "off" for the old disk group. # vxdg -g vxfencoorddg set coordinator=off 12 Verify that the coordinator disk group has changed. # cat /etc/vxfendg vxfendg The swap operation for the coordinator disk group is complete now.
PAGE 125
Administering SF Oracle RAC and its components Administering I/O fencing 3 Verify the name of the coordinator diskgroup. # cat /etc/vxfendg vxfencoorddg 4 Run the following command: # vxdisk -o alldgs list DEVICE c1t1d0 c2t1d0 c3t1d0 5 TYPE DISK auto:cdsdisk auto - auto - - GROUP STATUS - (vxfencoorddg) offline offline online Verify the number of disks used in the coordinator diskgroup.
PAGE 126
130 Administering SF Oracle RAC and its components Administering I/O fencing Refreshing lost keys on coordinator disks If the coordinator disks lose the keys that are registered, the cluster might panic when a network partition occurs. You can use the vxfenswap utility to replace the coordinator disks with the same disks. The vxfenswap utility registers the missing keys during the disk replacement.
PAGE 127
Administering SF Oracle RAC and its components Administering I/O fencing 4 On any node, run the following command to start the vxfenswap utility: # vxfenswap -g vxfencoorddg [-n] 5 Verify that the keys are atomically placed on the coordinator disks. # vxfenadm -s all -f /etc/vxfentab Device Name: /dev/vx/rdmp/c1t1d0 Total Number of Keys: 4 ... Enabling or disabling the preferred fencing policy You can enable or disable the preferred fencing feature for your I/O fencing configuration.
PAGE 128
132 Administering SF Oracle RAC and its components Administering I/O fencing ■ Set the value of the system-level attribute FencingWeight for each node in the cluster. For example, in a two-node cluster, where you want to assign galaxy five times more weight compared to nebula, run the following commands: # hasys -modify galaxy FencingWeight 50 # hasys -modify nebula FencingWeight 10 ■ Save the VCS configuration.
PAGE 129
Administering SF Oracle RAC and its components Administering the CP server To disable preferred fencing for the I/O fencing configuration 1 Make sure that the cluster is running with I/O fencing set up. # vxfenadm -d 2 Make sure that the cluster-level attribute UseFence has the value set to SCSI3. # haclus -value UseFence 3 To disable preferred fencing and use the default race policy, set the value of the cluster-level attribute PreferredFencingPolicy as Disabled.
PAGE 130
134 Administering SF Oracle RAC and its components Administering the CP server The user types and their access level privileges are assigned to individual users during SF Oracle RAC cluster configuration for fencing. During the installation process, you are prompted for a user name, password, and access level privilege (CP server admin or CP server operator). To administer and operate a CP server, there must be at least one CP server admin.
PAGE 131
Administering SF Oracle RAC and its components Administering the CP server Table 2-5 cpsadm command parameters (continued) Parameter Name Description -v victim node id This parameter specifies a victim node's node ID. -p port This parameter specifies the port number to connect to the CP server. -e user name This parameter specifies the user to be added to the CP server. -f user role This parameter specifies the user role (either cps_admin or cps_operator).
PAGE 132
136 Administering SF Oracle RAC and its components Administering the CP server Table 2-6 cpsadm command action types (continued) Action Description User type list_membership Lists the membership. CP server admin CP server operator list_nodes Lists all nodes in the current cluster. CP server admin list_users Lists all users. CP server admin ping_cps Pings a CP server.
PAGE 133
Administering SF Oracle RAC and its components Administering the CP server Environment variables associated with the coordination point server Table 2-7 describes the environment variables that are required for the cpsadm command. The cpsadm command detects these environment variables and uses their value when communicating with the CP server. They are used to authenticate and authorize the user. Note: The environment variables are not required when the cpsadm command is run on the CP server.
PAGE 134
138 Administering SF Oracle RAC and its components Administering the CP server For more information about the cpsadm command and the associated command options, see the cpsadm(1M) manual page.
PAGE 135
Administering SF Oracle RAC and its components Administering the CP server Adding or removing CP server users ■ To add a user Type the following command: # cpsadm -s cp_server -a add_user -e user_name -f user_role -g domain_type -u uuid ■ To remove a user Type the following command: # cpsadm -s cp_server -a rm_user -e user_name -g domain_type cp_server The CP server's virtual IP address or virtual hostname. user_name The user to be added to the CP server configuration.
PAGE 136
140 Administering SF Oracle RAC and its components Administering the CP server cluster_name The SF Oracle RAC cluster name. Preempting a node To preempt a node Type the following command: # cpsadm -s cp_server -a preempt_node -u uuid -n nodeid -v victim_node id cp_server The CP server's virtual IP address or virtual hostname. uuid The UUID (Universally Unique ID) of the SF Oracle RAC cluster. nodeid The node id of the SF Oracle RAC cluster node. victim_node id The victim node's node id.
PAGE 137
Administering SF Oracle RAC and its components Administering the CP server ■ To disable access for a user to a SF Oracle RAC cluster Type the following command: # cpsadm -s cp_server -a rm_clus_from_user -e user_name -f user_role -g domain_type -u uuid cp_server The CP server's virtual IP address or virtual hostname. user_name The user name to be added to the CP server. user_role The user role, either cps_admin or cps_operator. domain_type The domain type, for example vx, unixpwd, nis, etc.
PAGE 138
142 Administering SF Oracle RAC and its components Administering the CP server on a CP server if the CP server agent issues an alert on the loss of such registrations on the CP server database. The following procedure describes how to refresh the coordination point registrations. To refresh the registration keys on the coordination points for server-based fencing 1 Ensure that the SF Oracle RAC cluster nodes and users have been added to the new CP server(s).
PAGE 139
Administering SF Oracle RAC and its components Administering the CP server 4 Run the vxfenswap utility from one of the nodes of the cluster. The vxfenswap utility requires secure ssh connection to all the cluster nodes. Use -n to use rsh instead of default ssh. For example: # vxfenswap [-n] The command returns: VERITAS vxfenswap version The logfile generated for vxfenswap is /var/VRTSvcs/log/vxfen/vxfenswap.log. 19156 Please Wait...
PAGE 140
144 Administering SF Oracle RAC and its components Administering the CP server Note: If multiple clusters share the same CP server, you must perform this replacement procedure in each cluster. You can use the vxfenswap utility to replace coordination points when fencing is running in customized mode in an online cluster, with vxfen_mechanism=cps.
PAGE 141
Administering SF Oracle RAC and its components Administering the CP server To replace coordination points for an online cluster 1 Ensure that the SF Oracle RAC cluster nodes and users have been added to the new CP server(s). Run the following commands: # cpsadm -s cpserver -a list_nodes # cpsadm -s cpserver -a list_users If the SF Oracle RAC cluster nodes are not present here, prepare the new CP server(s) for use by the SF Oracle RAC cluster.
PAGE 142
146 Administering SF Oracle RAC and its components Administering the CP server 4 Use a text editor to access /etc/vxfenmode and update the values to the new CP server (coordination points). The values of the /etc/vxfenmode file have to be updated on all the nodes in the SF Oracle RAC cluster. Review and if necessary, update the vxfenmode parameters for security, the coordination points, and if applicable to your configuration, vxfendg.
PAGE 143
Administering SF Oracle RAC and its components Administering the CP server 3 147 Ensure that security is configured for communication between CP servers and SF Oracle RAC cluster nodes. See “About secure communication between the SF Oracle RAC cluster and CP server” on page 66. 4 Modify /etc/vxcps.conf on each CP server to set security=1.
PAGE 144
148 Administering SF Oracle RAC and its components Administering CFS 7 Authorize the user to administer the cluster. For example, issue the following command on the CP server (mycps.symantecexample.com): # cpsadm -s mycps.symantecexample.com -a\ add_clus_to_user -c cpcluster\ -u {f0735332-1dd1-11b2-a3cb-e3709c1c73b9}\ -e _HA_VCS_galaxy@HA_SERVICES@galaxy.symantec.com\ -f cps_operator -g vx Cluster successfully added to user _HA_VCS_galaxy@HA_SERVICES@galaxy.symantec.com privileges.
PAGE 145
Administering SF Oracle RAC and its components Administering CFS Using cfsmount to mount CFS file systems To mount a CFS file system using cfsmount: # cfsmount /oradata1 Mounting... [/dev/vx/dsk/oradatadg/oradatavol] mounted successfully at /oradata1 on galaxy [/dev/vx/dsk/oradatadg/oradatavol] mounted successfully at /oradata1 on nebula See the Veritas Storage Foundation Cluster File System Administrator's Guide for more information on the command.
PAGE 146
150 Administering SF Oracle RAC and its components Administering CVM /arch Node Cluster Manager CVM state archvol : : : MOUNT POINT /app/crshome /ocrvote /app/oracle/orahome /oradata1 /arch oradatadg MOUNTED nebula running running SHARED VOLUME DISK GROUP crsbinvol bindg ocrvotevol ocrvotedg orabinvol bindg oradatavol oradatadg archvol oradatadg STATUS MOUNTED MOUNTED MOUNTED MOUNTED MOUNTED Verifying CFS port CFS uses port ‘f’ for communication between nodes.
PAGE 147
Administering SF Oracle RAC and its components Administering CVM If you encounter issues while administering CVM, refer to the troubleshooting section for assistance. See “Troubleshooting CVM” on page 220. Establishing CVM cluster membership manually In most cases you do not have to start CVM manually; it normally starts when VCS is started. Run the following command to start CVM manually: # vxclustadm -m vcs -t gab startnode vxclustadm: initialization completed Note that vxclustadm reads main.
PAGE 148
152 Administering SF Oracle RAC and its components Administering CVM To change the CVM master manually 1 To view the current master, use one of the following commands: # vxclustadm nidmap Name galaxy nebula CVM Nid 0 1 CM Nid 0 1 State Joined: Slave Joined: Master # vxdctl -c mode mode: enabled: cluster active - MASTER master: nebula In this example, the CVM master is nebula.
PAGE 149
Administering SF Oracle RAC and its components Administering CVM 3 To monitor the master switching, use the following command: # vxclustadm -v nodestate state: cluster member nodeId=0 masterId=0 neighborId=1 members[0]=0xf joiners[0]=0x0 leavers[0]=0x0 members[1]=0x0 joiners[1]=0x0 leavers[1]=0x0 reconfig_seqnum=0x9f9767 vxfen=off state: master switching in progress reconfig: vxconfigd in join In this example, the state indicates that master is being changed.
PAGE 150
154 Administering SF Oracle RAC and its components Administering CVM a transaction is in progress. Try again In some cases, if the master switching operation is interrupted with another reconfiguration operation, the master change fails. In this case, the existing master remains the master of the cluster. After the reconfiguration is complete, reissue the vxclustadm setmaster command to change the master.
PAGE 151
Administering SF Oracle RAC and its components Administering CVM # vxdctl -c mode mode: enabled: cluster inactive # vxclustadm -v nodestate state: out of cluster On the master node, the following output is displayed: # vxdctl -c mode mode: enabled: cluster active - MASTER master: galaxy On the slave nodes, the following output is displayed: # vxdctl -c mode mode: enabled: cluster active - SLAVE master: nebula The following command lets you view all the CVM nodes at the same time: # vxclustadm nidmap Nam
PAGE 152
156 Administering SF Oracle RAC and its components Administering SF Oracle RAC global clusters The vxdctl -c mode command indicates whether a node is a CVM master or CVM slave. Verifying the state of CVM shared disk groups You can use the following command to list the shared disk groups currently imported in the SF Oracle RAC cluster: # vxdg list |grep shared orabinvol_dg enabled,shared 1052685125.1485.
PAGE 153
Administering SF Oracle RAC and its components Administering SF Oracle RAC global clusters Setting up a disaster recovery fire drill The Disaster Recovery Fire Drill procedure tests the fault-readiness of a configuration by mimicking a failover from the primary site to the secondary site. This procedure is done without stopping the application at the primary site and disrupting user access, interrupting the flow of replicated data, or causing the secondary site to need resynchronization.
PAGE 154
158 Administering SF Oracle RAC and its components Administering SF Oracle RAC global clusters The wizard performs the following specific tasks: ■ Creates a Cache object to store changed blocks during the fire drill, which minimizes disk space and disk spindles required to perform the fire drill. ■ Configures a VCS service group that resembles the real application group. The wizard works only with application groups that contain one disk group. The wizard sets up the first RVG in an application.
PAGE 155
Administering SF Oracle RAC and its components Administering SF Oracle RAC global clusters 5 Enter the cache size to store writes when the snapshot exists. The size of the cache must be large enough to store the expected number of changed blocks during the fire drill. However, the cache is configured to grow automatically if it fills up. Enter disks on which to create the cache. Press the Enter key when prompted. 6 The wizard starts running commands to create the fire drill setup.
PAGE 156
160 Administering SF Oracle RAC and its components Administering SF Oracle RAC global clusters Scheduling a fire drill You can schedule the fire drill for the service group using the fdsched script. The fdsched script is designed to run only on the lowest numbered node that is currently running in the cluster. The scheduler runs the command hagrp online firedrill_group -any at periodic intervals. To schedule a fire drill 1 Add the file /opt/VRTSvcs/bin/fdsched to your crontab.
PAGE 157
Administering SF Oracle RAC and its components Administering SF Oracle RAC global clusters ■ 161 The fire drill service group oradb_grp_fd creates a snapshot of the replicated data on the secondary site and starts the database using the snapshot. An offline local dependency is set between the fire drill service group and the application service group to make sure a fire drill does not block an application failover in case a disaster strikes the primary site. Figure 2-1 illustrates the configuration.
PAGE 158
162 Administering SF Oracle RAC and its components Administering SF Oracle RAC global clusters
PAGE 159
Section Performance and troubleshooting ■ Chapter 3. Troubleshooting SF Oracle RAC ■ Chapter 4. Prevention and recovery strategies ■ Chapter 5.
PAGE 160
164
PAGE 161
Chapter 3 Troubleshooting SF Oracle RAC This chapter includes the following topics: ■ About troubleshooting SF Oracle RAC ■ What to do if you see a licensing reminder ■ Restarting the installer after a failed connection ■ Installer cannot create UUID for the cluster ■ Troubleshooting installation and configuration check failures ■ Troubleshooting LLT health check warning messages ■ Troubleshooting LMX health check warning messages ■ Troubleshooting I/O fencing ■ Troubleshooting CVM ■ Tr
PAGE 162
166 Troubleshooting SF Oracle RAC About troubleshooting SF Oracle RAC Running scripts for engineering support analysis Troubleshooting scripts gather information about the configuration and status of your cluster and its modules. The scripts identify package information, debugging messages, console messages, and information about disk groups and volumes. Forwarding the output of these scripts to Symantec Tech Support can assist with analyzing and solving any problems.
PAGE 163
Troubleshooting SF Oracle RAC About troubleshooting SF Oracle RAC Table 3-1 Log file List of log files Location Oracle installation $ORACLE_BASE\ /oraInventory/logs/\ error log installActionsdate_time.log Description Contains errors that occurred during Oracle RAC installation. It clarifies the nature of the error and when it occurred during the installation. Note: Verify if there are any installation errors logged in this file, since they may prove to be critical errors.
PAGE 164
168 Troubleshooting SF Oracle RAC About troubleshooting SF Oracle RAC Table 3-1 List of log files (continued) Log file Location Description Agent log file for CVM /var/VRTSvcs/log/engine_A.log /var/VRTSvcs/log/CVMVxconfigd_A.log /var/VRTSvcs/log/CVMCluster_A.log /var/VRTSvcs/log/CVMVolDg_A.log Contains messages and errors related to CVM agent functions. Search for "cvm" in the engine_A.log for debug information. For more information, see the Veritas Volume Manager Administrator's Guide.
PAGE 165
Troubleshooting SF Oracle RAC About troubleshooting SF Oracle RAC Original string: clust_run=`$VXCLUSTADM -m vcs -t $TRANSPORT startnode 2> $CVM_ERR_FILE` Modified string: clust_run=`$VXCLUSTADM -m vcs -t $TRANSPORT -T startnode 2> $CVM_ERR_FILE` 2 Stop the cluster. # hastop -all 3 Start the cluster. # hastart At this point, CVM TIME_JOIN messages display in the /var/adm/messagesfile and on the console.
PAGE 166
170 Troubleshooting SF Oracle RAC About troubleshooting SF Oracle RAC This GAB logging daemon collects the GAB related logs when a critical events such as an iofence or failure of the master of any GAB port occur, and stores the data in a compact binary form.
PAGE 167
Troubleshooting SF Oracle RAC What to do if you see a licensing reminder 171 About SF Oracle RAC kernel and driver messages SF Oracle RAC drivers such as GAB print messages to the console if the kernel and driver messages are configured to be displayed on the console. Make sure that the kernel and driver messages are logged to the console. For details on how to configure console messages, see the syslog and /etc/syslog.conf files. For more information, see the operating system documentation.
PAGE 168
172 Troubleshooting SF Oracle RAC Restarting the installer after a failed connection Restarting the installer after a failed connection If an installation is killed because of a failed connection, you can restart the installer to resume the installation. The installer detects the existing installation. The installer prompts you whether you want to resume the installation. If you resume the installation, the installation proceeds from the point where the installation failed.
PAGE 169
Troubleshooting SF Oracle RAC Troubleshooting installation and configuration check failures Resolution Synchronize the time settings across the nodes. Note: For Oracle RAC 11g Release 2, it is mandatory to configure NTP for synchronizing time on all nodes in the cluster. System architecture checks failed with errors The system architecture check may fail with the following error: Checking system architecture information...
PAGE 170
174 Troubleshooting SF Oracle RAC Troubleshooting installation and configuration check failures Resolution Install the same version of the operating system and patch levels on all nodes in the cluster. CPU frequency checks failed with errors The CPU frequency checks across nodes may fail with the following error: Checking for CPU frequency match...
PAGE 171
Troubleshooting SF Oracle RAC Troubleshooting installation and configuration check failures Table 3-2 LLT Links' full duplex checks - messages Error message Cause Checking LLT Links' Full Duplex setting for ...Failed The link mode is not set Set the link mode to full to full duplex. duplex. Error: /etc/llttab does not exist on sys_name.
PAGE 172
176 Troubleshooting SF Oracle RAC Troubleshooting installation and configuration check failures Table 3-3 LLT link jumbo frame setting (MTU) checks - messages Error message Cause Resolution Error: /etc/llttab does not exist on sys_name. LLT link jumbo frame check can not proceed for sys_name Sometimes the check can not proceed if the /etc/llttab file does not exist on the node. The file may have been accidentally deleted or the SF Oracle RAC configuration is incomplete. Reconfigure SF Oracle RAC.
PAGE 173
Troubleshooting SF Oracle RAC Troubleshooting installation and configuration check failures You may see the following message in the logs: Error: /etc/llttab does not exist on sys_name. LLT Private Link check can not proceed for sys_name. Table 3-4 lists the messages displayed for the check. Table 3-4 LLT links' speed and auto negotiation checks - messages Error message Cause Resolution Checking LLT Links' speed and auto negotiation settings for sys_name...
PAGE 174
178 Troubleshooting SF Oracle RAC Troubleshooting installation and configuration check failures Table 3-5 lists the messages displayed for the check. Table 3-5 LLT link priority checks - Messages Error message Cause Resolution Error: There are only no_of_links High Priority links present on sys_name (less than 2) The number of high priority LLT links in the cluster are less than the two. Configure at least two high-priority links.
PAGE 175
Troubleshooting SF Oracle RAC Troubleshooting installation and configuration check failures ■ A node has been removed from the cluster but the /etc/llthosts file has not been updated. Resolution Update the /etc/llthosts file to correctly reflect the number of nodes in the cluster. Restart the cluster. Checks on cluster ID failed with errors The cluster ID checks may fail with the following error: Checking for cluster-ID match...
PAGE 176
180 Troubleshooting SF Oracle RAC Troubleshooting installation and configuration check failures Table 3-6 Checking cluster-ID match - messages (continued) Error message Cause Resolution Error: Cluster-ID differences between: . The /etc/llttab file Update the /etc/llttab has been incorrectly file to correctly reflect the modified manually. cluster IDs of the nodes in the cluster. Restart the cluster.
PAGE 177
Troubleshooting SF Oracle RAC Troubleshooting installation and configuration check failures Table 3-7 Checking fencing configuration - messages (continued) Error message Cause Resolution Failed (Configured in disabled mode) Fencing is configured in disabled mode. Reconfigure disk-based fencing. For instructions, see the chapter "Configuring SF Oracle RAC clusters for data integrity" in the Veritas Storage Foundation for Oracle RAC Installation and Configuration Guide.
PAGE 178
182 Troubleshooting SF Oracle RAC Troubleshooting installation and configuration check failures ODM configuration checks failed with errors The ODM configuration check may fail with the following error: Checking ODM configuration on sys_name...Failed Cause Check for one of the following causes: ■ ODM is not running in the cluster. ■ ODM is running in standalone mode. Resolution To resolve the issue 1 Start ODM: # /sbin/init.
PAGE 179
Troubleshooting SF Oracle RAC Troubleshooting installation and configuration check failures Resolution Start VCSMM. # /sbin/init.d/vcsmm start GAB ports configuration checks failed with errors The GAB ports configuration checks may fail with the following error: Checking GAB ports configuration on sys_name...Failed You may see the following message in the logs: GAB is not configured on node: sys_name Table 3-8 lists the messages displayed for the check.
PAGE 180
184 Troubleshooting SF Oracle RAC Troubleshooting installation and configuration check failures You may see the following messages in the logs: ■ Error: LMX is running with helper thread ENABLED on node: sys_name. LMX helper thread should be disabled. ■ Error: Could not run LMX helper thread check on sys_name due to missing /opt/VRTSvcs/rac/bin/lmxshow file. LMX does not seem to be installed on sys_name. Table 3-9 lists the messages displayed for the check.
PAGE 181
Troubleshooting SF Oracle RAC Troubleshooting installation and configuration check failures Disabling the LMX helper thread To disable the LMX helper thread 1 Stop the Oracle database instance: If the database instances are managed by VCS, take the corresponding VCS service groups offline: $ hagrp -offline group_name -sys system_name If the database instances are not managed by VCS, then run the following on one node: $ srvctl stop instance -d database_name -i instance_name 2 Stop LMX. # /sbin/init.
PAGE 182
186 Troubleshooting SF Oracle RAC Troubleshooting installation and configuration check failures Oracle DB level different on: sys_name1 (version1) and sys_name2 (version2). Problem running 'opatch' utility on sys_name. Error info: error_info Table 3-10 lists the messages displayed for the check. Table 3-10 Oracle database version checks - messages Error message Cause Resolution Oracle DB level different on: sys_name1 version1 and sys_name2 version2.
PAGE 183
Troubleshooting SF Oracle RAC Troubleshooting installation and configuration check failures Oracle Process Daemon (oprocd) checks failed with errors The Oracle process daemon (oprocd) check may fail with the following error: Checking presence of Oracle Process Daemon (oprocd) on sys_name...Failed Table 3-11 lists the messages displayed for the check.
PAGE 184
188 Troubleshooting SF Oracle RAC Troubleshooting installation and configuration check failures ■ Error: There is no node present in Oracle Clusterware membership, which is listed in /etc/llthosts. ■ Error: Node numbering of LLT and Oracle Clusterware is different. This step is mandatory for SF Oracle RAC to function. ■ Error: /etc/llthosts file is missing on sys_name. Node ID match check can not proceed. Skipping. Table 3-12 lists the messages displayed for the check.
PAGE 185
Troubleshooting SF Oracle RAC Troubleshooting installation and configuration check failures Table 3-12 Node ID checks for LLT and Oracle Clusterware - messages (continued) Error message Cause Resolution Error: Node numbering of LLT and Oracle Clusterware is different. The check fails if the LLT configuration has been incorrectly modified after Oracle RAC installation. Make sure that the node ID information in the LLT configuration file corresponds with that of Oracle Clusterware.
PAGE 186
190 Troubleshooting SF Oracle RAC Troubleshooting installation and configuration check failures Table 3-12 Node ID checks for LLT and Oracle Clusterware - messages (continued) Error message Cause Resolution Error: /etc/llthosts file is missing on sys_name. Node ID match check can not proceed. Skipping. The SF Oracle RAC file If the installation is is accidentally removed incomplete or failed, or the installation is reinstall SF Oracle RAC. incomplete.
PAGE 187
Troubleshooting SF Oracle RAC Troubleshooting installation and configuration check failures ■ IPC library linking check on sys_name has failed. Link IPC library properly. ■ VCSMM library check on sys_name has failed Link VCSMM library properly. ■ ODM library check on sys_name has failed Link ODM library properly. Cause The Veritas libraries (VCS IPC, ODM, and VCSMM) are not linked correctly with the Oracle RAC libraries.
PAGE 188
192 Troubleshooting SF Oracle RAC Troubleshooting installation and configuration check failures Resolution Set the correct tunable settings. For more information: See “About SF Oracle RAC tunable parameters” on page 243. Oracle user checks failed with errors The permission checks on the Oracle user may fail with the following error: Checking permissions for ORACLE_USER:oracle_user on sys_name...Failed You may see the following messages in the logs: ■ Error: No Oracle user detected.
PAGE 189
Troubleshooting SF Oracle RAC Troubleshooting installation and configuration check failures Table 3-13 Checking permissions for ORACLE_USER - messages (continued) Error message Cause Permission denied for ORACLE_USER:oracle_user for files $vcs->{llthosts} and $vcs->{llttab} on sys_name The files do not have read Grant the read permission permissions for "other" for the files to "other" users. users.
PAGE 190
194 Troubleshooting SF Oracle RAC Troubleshooting installation and configuration check failures ■ Error: Either $vcs->{llthosts} or $vcs->{llttab} or both are not available on sys_name. Skipping this check. ■ Error: Oracle user: $prod->{orauser} does not have permission to access $vcs->{llthosts} and $vcs->{llttab}. Table 3-14 lists the messages displayed for the check.
PAGE 191
Troubleshooting SF Oracle RAC Troubleshooting installation and configuration check failures Resolution Check the logs at /opt/VRTS/install/logs to determine which kernel messages have failed. See the Oracle RAC documentation for instructions on setting the kernel parameters correctly. Troubleshooting pre-installation check failures Table 3-15 provides guidelines for resolving failures that may be reported when you start the SF Oracle RAC installation program.
PAGE 192
196 Troubleshooting SF Oracle RAC Troubleshooting installation and configuration check failures Table 3-15 Troubleshooting pre-installation check failures (continued) Error message Cause Resolution Checking platform version The version of the operating Make sure that the systems system installed on the meet the software criteria system is not supported. required for the release.
PAGE 193
Troubleshooting SF Oracle RAC Troubleshooting LLT health check warning messages Table 3-16 Checking installed product - messages (continued) Message Resolution Entered systems have different versions of $cprod_name installed: prod_name-prod_ver-sys_name. Systems running different product versions must be upgraded independently. Two or more systems specified have different versions of SF Oracle RAC installed. For example, SF Oracle RAC 5.0 is installed on some systems while SF Oracle RAC 5.
PAGE 194
198 Troubleshooting SF Oracle RAC Troubleshooting LLT health check warning messages Table 3-17 Troubleshooting LLT warning messages Warning Possible causes Recommendation Warning: OS timer is not called for num seconds CPU and memory consumption on the node is high. Check for applications that may be throttling the CPU and memory resources on the node. Warning: Kernel The available memory on the failed to allocate node is insufficient.
PAGE 195
Troubleshooting SF Oracle RAC Troubleshooting LLT health check warning messages Table 3-17 Troubleshooting LLT warning messages (continued) Warning Possible causes Recommendation Only one link is ■ One of the configured private ■ If the private interconnects is interconnect is faulty, configured under non-operational. replace the link. LLT. Symantec ■ Only one private interconnect ■ Configure a minimum of recommends has been configured under two private configuring a LLT. interconnects.
PAGE 196
200 Troubleshooting SF Oracle RAC Troubleshooting LMX health check warning messages Table 3-17 Troubleshooting LLT warning messages (continued) Warning Possible causes % per of total transmitted packets are with large xmit latency (>16ms) for port port id The CPU and memory ■ Check for applications consumption on the node may be that may be throttling high or the network bandwidth CPU and memory usage. is insufficient.
PAGE 197
Troubleshooting SF Oracle RAC Troubleshooting I/O fencing Table 3-18 Troubleshooting LMX warning messages (continued) Warning Possible causes Oracle is not The Oracle RAC libraries are not linked with the SF Oracle RAC linked to the libraries. Symantec LMX library. This warning is not applicable for Oracle 11g running cluster. Recommendation Relink the Oracle RAC libraries with SF Oracle RAC. For instructions, see the Veritas Storage Foundation for Oracle RAC Installation and Configuration Guide.
PAGE 198
202 Troubleshooting SF Oracle RAC Troubleshooting I/O fencing The vxfentsthdw utility fails when SCSI TEST UNIT READY command fails While running the vxfentsthdw utility, you may see a message that resembles as follows: Issuing SCSI TEST UNIT READY to disk reserved by other node FAILED. Contact the storage provider to have the hardware configuration fixed.
PAGE 199
Troubleshooting SF Oracle RAC Troubleshooting I/O fencing See “How vxfen driver checks for preexisting split-brain condition” on page 203. How vxfen driver checks for preexisting split-brain condition The vxfen driver functions to prevent an ejected node from rejoining the cluster after the failure of the private network links and before the private network links are repaired. For example, suppose the cluster of system 1 and system 2 is functioning normally when the private network links are broken.
PAGE 200
204 Troubleshooting SF Oracle RAC Troubleshooting I/O fencing However, the same error can occur when the private network links are working and both systems go down, system 1 restarts, and system 2 fails to come back up. From the view of the cluster from system 1, system 2 may still have the registrations on the coordinator disks.
PAGE 201
Troubleshooting SF Oracle RAC Troubleshooting I/O fencing The warning implies that the local cluster with the cluster ID 57069 has keys. However, the disk also has keys for cluster with ID 48813 which indicates that nodes from the cluster with cluster id 48813 potentially use the same coordinator disk. You can run the following commands to verify whether these disks are used by another cluster. Run the following commands on one of the nodes in the local cluster.
PAGE 202
206 Troubleshooting SF Oracle RAC Troubleshooting I/O fencing Registered keys are lost on the coordinator disks If the coordinator disks lose the keys that are registered, the cluster might panic when a cluster reconfiguration occurs. To refresh the missing keys ◆ Use the vxfenswap utility to replace the coordinator disks with the same disks. The vxfenswap utility registers the missing keys during the disk replacement. See “Refreshing lost keys on coordinator disks” on page 130.
PAGE 203
Troubleshooting SF Oracle RAC Troubleshooting I/O fencing 4 Stop I/O fencing on each node: # /sbin/init.d/vxfen stop This removes any registration keys on the disks. 5 Import the coordinator disk group. The file /etc/vxfendg includes the name of the disk group (typically, vxfencoorddg) that contains the coordinator disks, so use the command: # vxdg -tfC import ‘cat /etc/vxfendg‘ where: -t specifies that the disk group is imported only until the node restarts.
PAGE 204
208 Troubleshooting SF Oracle RAC Troubleshooting I/O fencing 9 After replacing disks in a coordinator disk group, deport the disk group: # vxdg deport ‘cat /etc/vxfendg‘ 10 On each node, start the I/O fencing driver: # /sbin/init.d/vxfen start 11 On each node, start the VCSMM driver: # /sbin/init.d/vcsmm start 12 Verify that the I/O fencing module has started and is enabled. # gabconfig -a Make sure that port b and port o memberships exist in the output for all nodes in the cluster.
PAGE 205
Troubleshooting SF Oracle RAC Troubleshooting I/O fencing Table 3-19 Troubleshooting I/O fencing warning messages (continued) Warning Possible causes Recommendation VxFEN is running ■ I/O fencing is enabled with ■ Configure a minimum of only one coordinator disk. three coordinator disks. with only one ■ Add or remove coordinator disk. ■ The number of disks configured for I/O fencing are coordinators disks to Loss of this disk even. meet the odd coordinator will prevent disk criterion.
PAGE 206
210 Troubleshooting SF Oracle RAC Troubleshooting I/O fencing Table 3-19 Troubleshooting I/O fencing warning messages (continued) Warning Possible causes Recommendation SCSI3 write-exclusive reservation is missing on shared disk (disk_name) The SCSI3 reservation is accidentally removed. Register and reserve the key again.
PAGE 207
Troubleshooting SF Oracle RAC Troubleshooting I/O fencing ■ If the vxcpserv process fails on the CP server, then review the following diagnostic files: ■ /var/VRTScps/diag/FFDC_CPS_pid_vxcpserv.log ■ /var/VRTScps/diag/stack_pid_vxcpserv.txt Note: If the vxcpserv process fails on the CP server, these files are present in addition to a core file. VCS restarts vxcpserv process automatically in such situations. The file /var/VRTSvcs/log/vxfen/vxfend_[ABC].
PAGE 208
212 Troubleshooting SF Oracle RAC Troubleshooting I/O fencing You must have set the environment variables CPS_USERNAME and CPS_DOMAINTYPE to run the cpsadm command on the SF Oracle RAC cluster (client cluster) nodes. To check the connectivity of CP server ◆ Run the following command to check whether a CP server is up and running at a process level: # cpsadm -s cp_server -a ping_cps where cp_server is the virtual IP address or virtual hostname on which the CP server is listening.
PAGE 209
Troubleshooting SF Oracle RAC Troubleshooting I/O fencing Table 3-20 Fencing startup issues on SF Oracle RAC cluster (client cluster) nodes (continued) Issue Description and resolution Authentication failure If you had configured secure communication between the CP server and the SF Oracle RAC cluster (client cluster) nodes, authentication failure can occur due to the following causes: Symantec Product Authentication Services (AT) is not properly configured on the CP server and/or the SF Oracle RAC cl
PAGE 210
214 Troubleshooting SF Oracle RAC Troubleshooting I/O fencing Table 3-20 Fencing startup issues on SF Oracle RAC cluster (client cluster) nodes (continued) Issue Description and resolution Preexisting split-brain Assume the following situations to understand preexisting split-brain in server-based fencing: There are three CP servers acting as coordination points. One of the three CP servers then becomes inaccessible. While in this state, also one client node leaves the cluster.
PAGE 211
Troubleshooting SF Oracle RAC Troubleshooting I/O fencing ■ The coordination points listed in the /etc/vxfenmode file on the different SF Oracle RAC cluster nodes are not the same. If different coordination points are listed in the /etc/vxfenmode file on the cluster nodes, then the operation fails due to failure during the coordination point snapshot check. ■ There is no network connectivity from one or more SF Oracle RAC cluster nodes to the CP server(s).
PAGE 212
216 Troubleshooting SF Oracle RAC Troubleshooting I/O fencing ■ To obtain I/O fencing cluster information on the CP server, run the following command on one of the cluster nodes: # cpsadm -s cp_server -a list_membership -c cluster_name where cp server is the virtual IP address or virtual hostname on which the CP server is listening, and cluster name is the VCS name for the SF Oracle RAC cluster.
PAGE 213
Troubleshooting SF Oracle RAC Troubleshooting I/O fencing To troubleshoot server-based I/O fencing configuration in mixed mode 1 Review the current I/O fencing configuration by accessing and viewing the information in the vxfenmode file. Enter the following command on one of the SF Oracle RAC cluster nodes: # cat /etc/vxfenmode vxfen_mode=customized vxfen_mechanism=cps scsi3_disk_policy=dmp security=0 cps1=[10.140.94.101]:14250 vxfendg=vxfencoorddg 2 Review the I/O fencing cluster information.
PAGE 214
218 Troubleshooting SF Oracle RAC Troubleshooting I/O fencing 3 Review the SCSI registration keys for the coordinator disks used in the I/O fencing configuration. The variables disk_7 and disk_8 in the following commands represent the disk names in your setup. Enter the vxfenadm -s command on each of the SF Oracle RAC cluster nodes.
PAGE 215
Troubleshooting SF Oracle RAC Troubleshooting I/O fencing 4 Review the CP server information about the cluster nodes. On the CP server, run the cpsadm list nodes command to review a list of nodes in the cluster. # cpsadm -s cp_server -a list_nodes where cp server is the virtual IP address or virtual hostname on which the CP server is listening. 5 Review the CP server list membership. On the CP server, run the following command to review the list membership.
PAGE 216
220 Troubleshooting SF Oracle RAC Troubleshooting CVM Understanding error messages VCS generates two error message logs: the engine log and the agent log. Log file names are appended by letters. Letter A indicates the first log file, B the second, C the third, and so on. The engine log is located at /var/VRTSvcs/log/engine_A.log. The format of engine log messages is: Timestamp (Year/MM/DD) | Mnemonic | Severity | UMI| Message Text ■ Timestamp: The date and time the message was generated.
PAGE 217
Troubleshooting SF Oracle RAC Troubleshooting CVM Restoring communication between host and disks after cable disconnection If a fiber cable is inadvertently disconnected between the host and a disk, you can restore communication between the host and the disk without restarting. To restore lost cable communication between host and disk 1 Reconnect the cable. 2 On all nodes, use the ioscan -funC disk command to scan for new disks. It may take a few minutes before the host is capable of seeing the disk.
PAGE 218
222 Troubleshooting SF Oracle RAC Troubleshooting CVM To verify vxfen driver is configured ◆ Check the GAB ports with the command: # gabconfig -a Port b must exist on the local system. Error importing shared disk groups The following message may appear when importing shared disk group: VxVM vxdg ERROR V-5-1-587 Disk group disk group name: import failed: No valid disk found containing disk group You may need to remove keys written to the disk.
PAGE 219
Troubleshooting SF Oracle RAC Troubleshooting CVM To resolve the issue if cssd is configured as a critical resource 1 Log onto one of the nodes in the existing cluster as the root user. 2 Configure the cssd resource as a non-critical resource in the cvm group: # haconf -makerw # hares -modify cssd Critical 0 # haconf -dump -makero To resolve the issue if other resources in the group are not online 1 Log onto one of the nodes in the existing cluster as the root user.
PAGE 220
224 Troubleshooting SF Oracle RAC Troubleshooting VCSIPC Troubleshooting VCSIPC This section discusses troubleshooting VCSIPC problems. VCSIPC wait warning messages in Oracle trace/log files When Gigabit Ethernet interconnections are used, a high load can cause LMX/LLT to flow-control VCSIPC, resulting in warning messages to be reported in the Oracle trace file.
PAGE 221
Troubleshooting SF Oracle RAC Troubleshooting Oracle or, Reporting communication error with node Check whether the Oracle Real Application Cluster instance on the other system is still running or has been restarted. The warning message indicates that the VCSIPC/LMX connection is no longer valid. Troubleshooting Oracle This section discusses troubleshooting Oracle.
PAGE 222
226 Troubleshooting SF Oracle RAC Troubleshooting Oracle OCSSD core dump file Core dumps from the ocssd and the pid for the css daemon whose death is treated as fatal are located here. If there are any abnormal restarts for css the core files, they are found here.
PAGE 223
Troubleshooting SF Oracle RAC Troubleshooting Oracle The process fails because the Veritas skgxn library is copied directly to the Oracle Clusterware home directory ($CRS_HOME/lib) instead of linking the library in the Oracle Clusterware home directory to the library /opt/nmapi/nmapi2/lib/hpux64/libnmapi2.so. To resolve the issue, create a symbolic link for the library from the Oracle Clusterware home directory to the library /opt/nmapi/nmapi2/lib/hpux64/libnmapi2.
PAGE 224
228 Troubleshooting SF Oracle RAC Troubleshooting Oracle /var/VRTSvcs/log/engine_A.log /var/VRTSvcs/log/Oracle_A.log Resolving ASYNCH_IO errors If ASYNCH_IO errors occur during select and update queries on the Oracle database, the workaround involves setting the MLOCK privilege for the dba user.
PAGE 225
Troubleshooting SF Oracle RAC Troubleshooting Oracle String value in the file: Oracle CSSD failure. Rebooting for cluster integrity String value in the file: Waiting for file system containing String value in the file: Oracle Cluster Ready Services disabled by corrupt install String value in the file: OCR initialization failed accessing OCR device Oracle Clusterware may fail due to Oracle CSSD failure.
PAGE 226
230 Troubleshooting SF Oracle RAC Troubleshooting Oracle To remove Oracle Clusterware 1 Run the rootdelete.sh script (in this example, $CRS_HOME is '/crshome'): # cd /crshome/install # ./rootdelete.sh Run the rootdeinstall.sh script: # cd /crshome/install # ./rootdeinstall.sh 2 Copy the file inittab.orig back to the name and remove other init files: # cd /sbin/init.d # cp inittab.orig inittab # rm init.crs init.crsd init.cssd init.evmd # rm /sbin/rc2.d/K96init.crs # rm /sbin/rc2.d/S96init.
PAGE 227
Troubleshooting SF Oracle RAC Troubleshooting Oracle 5 Remove files from the OCR and Voting disk directories. For our example: # rm /ocrvote/ocr # rm /ocrvote/vote-disk If OCR and Voting disk storage are on raw volumes, use command resembling: # dd if=/dev/zero of=/dev/vx/rdsk/ocrvotedg/ocrvol bs=8192 \ count=18000 # dd if=/dev/zero of=/dev/vx/rdsk/ocrvotedg/votvol bs=8192 \ count=3000 6 Reboot the systems to make sure no CRS daemons are running.
PAGE 228
232 Troubleshooting SF Oracle RAC Troubleshooting Oracle $ srvctl start nodeapps -n node_name Loss of connectivity to OCR and voting disk causes the cluster to panic If the CVM master node looses connectivity to the SAN, the default settings for the disk detach policy (global) and the disk group fail policy (dgdisable) causes the Oracle Cluster Registry (OCR) and voting disk to be disabled from all nodes, thereby panicking the cluster.
PAGE 229
Troubleshooting SF Oracle RAC Troubleshooting Oracle Table 3-22 Troubleshooting Oracle Clusterware warning messages (continued) Warning Possible causes No CSSD resource is The CSSD resource is not configured under VCS. configured under VCS. Recommendation Configure the CSSD resource under VCS and bring the resource online. For instructions, see the Veritas Storage Foundation for Oracle RAC Installation and Configuration Guide. The CSSD resource name is not running. ■ VCS is not running.
PAGE 230
234 Troubleshooting SF Oracle RAC Troubleshooting ODM Troubleshooting ODM This section discusses troubleshooting ODM. File System configured incorrectly for ODM shuts down Oracle Linking Oracle RAC with the Veritas ODM libraries provides the best file system performance. Review the instructions on creating the link and confirming that Oracle uses the libraries. Shared file systems in RAC clusters without ODM libraries linked to Oracle RAC may exhibit slow performance and are not supported.
PAGE 231
Chapter 4 Prevention and recovery strategies This chapter includes the following topics: ■ Verification of GAB ports in SF Oracle RAC cluster ■ Examining GAB seed membership ■ Manual GAB membership seeding ■ Evaluating VCS I/O fencing ports ■ Verifying normal functioning of VCS I/O fencing ■ Managing SCSI-3 PR keys in SF Oracle RAC cluster ■ Identifying a faulty coordinator LUN ■ Starting shared volumes manually ■ Listing all the CVM shared disks ■ Failure scenarios and recovery strateg
PAGE 232
236 Prevention and recovery strategies Examining GAB seed membership ■ VCS (‘had’) ■ vcsmm (membership module for SF Oracle RAC) ■ CVM (kernel messaging) ■ CVM (vxconfigd) ■ CVM (to ship commands from slave node to master node) The following command can be used to verify the state of GAB ports: # gabconfig -a GAB Port Memberships Port Port Port Port Port Port Port Port Port a b d f h o u v w gen gen gen gen gen gen gen gen gen 7e6e7e05 58039502 588a7d02 1ea84702 cf430b02 de8f0202 de4f0203 db
PAGE 233
Prevention and recovery strategies Manual GAB membership seeding Port a gen 7e6e7e01 membership 01 In this case, 7e6e7e01 indicates the “membership generation number” and 01 corresponds to the cluster’s “node map”. All nodes present in the node map reflects the same membership ID as seen by the following command: # gabconfig -a | grep "Port a" The semi-colon is used as a placeholder for a node that has left the cluster.
PAGE 234
238 Prevention and recovery strategies Evaluating VCS I/O fencing ports ■ Verify that none of the other nodes in the cluster have a port “a” membership ■ Verify that none of the other nodes have any shared disk groups imported ■ Determine why any node that is still running does not have a port “a” membership Run the following command to manually seed GAB membership: # gabconfig -cx Refer to gabconfig (1M) for more details.
PAGE 235
Prevention and recovery strategies Verifying normal functioning of VCS I/O fencing GAB INFO V-15-1-20026 Port a registration waiting for seed port membership Verifying normal functioning of VCS I/O fencing It is mandatory to have VCS I/O fencing enabled in SF Oracle RAC cluster to protect against split-brain scenarios.
PAGE 236
240 Prevention and recovery strategies Managing SCSI-3 PR keys in SF Oracle RAC cluster # vxdg list |grep data galaxy_data1 enabled,shared,cds 1201715530.28.
PAGE 237
Prevention and recovery strategies Identifying a faulty coordinator LUN Detecting accidental SCSI-3 PR key removal from coordinator LUNs The keys currently installed on the coordinator disks can be read using the following command: # vxfenadm -s all -f /etc/vxfentab There should be a key for each node in the operating cluster on each of the coordinator disks for normal cluster operation. There will be two keys for every node if you have a two-path DMP configuration.
PAGE 238
242 Prevention and recovery strategies Failure scenarios and recovery strategies for CP server setup Table 4-1 Failure scenarios and recovery strategy considerations CP server issue Description CP server planned replacement After you configure server-based I/O fencing, you may need to replace the CP servers. As an administrator, you can perform a planned replacement of a CP server with either another CP server or a SCSI-3 disk without incurring application downtime on the SF Oracle RAC cluster.
PAGE 239
Chapter 5 Tunable parameters This chapter includes the following topics: ■ About SF Oracle RAC tunable parameters ■ About GAB tunable parameters ■ About LLT tunable parameters ■ About LMX tunable parameters ■ About VXFEN tunable parameters ■ Tuning guidelines for campus clusters About SF Oracle RAC tunable parameters Tunable parameters can be configured to enhance the performance of specific SF Oracle RAC features.
PAGE 240
244 Tunable parameters About GAB tunable parameters Warning: Do not adjust the SF Oracle RAC tunable parameters for LMX and VXFEN as described below to enhance performance without assistance from Symantec support personnel. About GAB tunable parameters GAB provides various configuration and tunable parameters to modify and control the behavior of the GAB module.
PAGE 241
Tunable parameters About GAB tunable parameters Table 5-1 GAB static tunable parameters (continued) GAB parameter Description Values (default and range) flowctrl Number of pending messages in GAB queues (send or receive) before GAB hits flow control. Default: 128 Range: 1-1024 This can be overwritten while cluster is up and running with the gabconfig -Q option. Use the gabconfig command to control value of this tunable.
PAGE 242
246 Tunable parameters About GAB tunable parameters Table 5-1 GAB static tunable parameters (continued) GAB parameter Description Values (default and range) gab_ibuf_count Determines whether the GAB logging daemon is enabled or disabled Default: 8 Range: 0-32 The GAB logging daemon is enabled by default. To disable, change the value of gab_ibuf_count to 0. This can be overwritten while cluster is up and running with the gabconfig -K option.
PAGE 243
Tunable parameters About GAB tunable parameters Table 5-2 GAB dynamic tunable parameters GAB parameter Description and command Control Seed port Default: Disabled This option defines the minimum number of nodes that can form the cluster. This option controls the forming of the cluster. If the number of nodes in the cluster is less than the number specified in the gabtab file, then the cluster will not form.
PAGE 244
248 Tunable parameters About GAB tunable parameters Table 5-2 GAB dynamic tunable parameters (continued) GAB parameter Description and command Missed heartbeat halt Default: Disabled If this option is enabled then the system will panic on missing the first heartbeat from the VCS engine or the vxconfigd daemon in a CVM environment. The default option is to disable the immediate panic.
PAGE 245
Tunable parameters About GAB tunable parameters Table 5-2 GAB dynamic tunable parameters (continued) GAB parameter Description and command Halt on rejoin Default: Disabled This option allows the user to configure the behavior of the VCS engine or any other user process when one or more nodes rejoin a cluster after a network partition. By default GAB will not PANIC the node running the VCS engine. GAB kills the userland process (the VCS engine or the vxconfigd process).
PAGE 246
250 Tunable parameters About GAB tunable parameters Table 5-2 GAB dynamic tunable parameters (continued) GAB parameter Description and command Quorum flag Default: Disabled This is an option in GAB which allows a node to IOFENCE (resulting in a PANIC) if the new membership set is < 50% of the old membership set.
PAGE 247
Tunable parameters About LLT tunable parameters Table 5-2 GAB dynamic tunable parameters (continued) GAB parameter Description and command Stable timeout Default: 5000(ms) Specifies the time GAB waits to reconfigure membership after the last report from LLT of a change in the state of local node connections for a given port. Any change in the state of connections will restart GAB waiting period.
PAGE 248
252 Tunable parameters About LLT tunable parameters command on all the nodes in the cluster to change the values of the parameters. To set an LLT parameter across system reboots, you must include the parameter definition in the /etc/llttab file. Default values of the parameters are taken if nothing is specified in /etc/llttab. The parameters values specified in the /etc/llttab file come into effect at LLT start-time only.
PAGE 249
Tunable parameters About LLT tunable parameters Table 5-3 LLT timer tunable parameters LLT parameter Description Default When to change peerinact LLT marks a link of a peer 1600 node as “inactive," if it does not receive any packet on that link for this timer interval. Once a link is marked as "inactive," LLT will not send any data on that link. ■ peertrouble LLT marks a high-pri link of a peer node as "troubled", if it does not receive any packet on that link for this timer interval.
PAGE 250
254 Tunable parameters About LLT tunable parameters Table 5-3 LLT parameter Description LLT timer tunable parameters (continued) Default When to change Dependency with other LLT tunable parameters peertroublelo LLT marks a low-pri link of a 400 peer node as "troubled", if it does not receive any packet on that link for this timer interval. Once a link is marked as "troubled", LLT will not send any data on that link till the link is available.
PAGE 251
Tunable parameters About LLT tunable parameters Table 5-3 LLT parameter Description LLT timer tunable parameters (continued) Default When to change timetosendhb LLT sends out of timer 200 context heartbeats to keep the node alive when LLT timer does not run at regular interval. This option specifies the amount of time to wait before sending a heartbeat in case of timer not running. If this timer tunable is set to 0, the out of timer context heartbeating mechanism is disabled.
PAGE 252
256 Tunable parameters About LLT tunable parameters Table 5-3 LLT timer tunable parameters (continued) LLT parameter Description Default When to change Dependency with other LLT tunable parameters service LLT calls its service routine 100 (which delivers messages to LLT clients) after every service timer interval. Do not change this value for Not applicable performance reasons. arp LLT flushes stored address of peer nodes when this timer expires and relearns the addresses.
PAGE 253
Tunable parameters About LLT tunable parameters Table 5-4 LLT flow control tunable parameters (continued) LLT parameter Description Default When to change lowwater When LLT has flow 100 controlled the client, it will not start accepting packets again till the number of packets in the port transmit queue for a node drops to lowwater. rporthighwater When the number of packets 200 in the receive queue for a port reaches highwater, LLT is flow controlled.
PAGE 254
258 Tunable parameters About LMX tunable parameters Table 5-4 LLT flow control tunable parameters (continued) LLT parameter Description Default When to change Dependency with other LLT tunable parameters linkburst This flow control value 32 should not be higher than the difference between the highwater flow control value and the lowwater flow control value. For performance reasons, its Not applicable value should be either 0 or at least 32.
PAGE 255
Tunable parameters About LMX tunable parameters Table 5-5 LMX Tunable parameters LMX parameter Default value Maximum value Description lmx_minor_max 8192 65535 Specifies the maximum number of contexts system-wide. Each Oracle process typically has two LMX contexts. "Contexts" and "minors" are used interchangeably in the documentation; "context" is an Oracle-specific term to specify the value in the lmx.conf file.
PAGE 256
260 Tunable parameters About LMX tunable parameters To reconfigure the LMX module This section discusses how to reconfigure the LMX module on the node. For the parameter changes to take effect, you must reconfigure the LMX module. 1 Configure the tunable parameter.
PAGE 257
Tunable parameters About VXFEN tunable parameters About VXFEN tunable parameters The section describes the VXFEN tunable parameters and how to reconfigure the VXFEN module. Table 5-6 describes the tunable parameters for the VXFEN driver.
PAGE 258
262 Tunable parameters About VXFEN tunable parameters Table 5-6 VXFEN tunable parameters (continued) vxfen Parameter Description and Values: Default, Minimum, and Maximum vxfen_panic_time Specifies the time in seconds that the I/O fencing driver VxFEN passes to the GAB module to wait until fencing completes its arbitration before GAB implements its decision in the event of a split-brain.
PAGE 259
Tunable parameters Tuning guidelines for campus clusters 3 Unmount CFS mounts (if mounts are not under VCS control). Determine the file systems to unmount by checking the /etc/mnttab file. # mount | grep vxfs | grep cluster To unmount the mount points listed in the output, enter: # umount mount_point 4 Stop VCS. # hastop -local 5 Check that this node is registered at gab ports a, b, d, and o only. Ports f, h, v, and w should not be seen on this node.
PAGE 260
264 Tunable parameters Tuning guidelines for campus clusters
PAGE 261
Section Reference ■ Appendix A. List of SF Oracle RAC health checks ■ Appendix B.
PAGE 262
266
PAGE 263
Appendix A List of SF Oracle RAC health checks This appendix includes the following topics: ■ LLT health checks ■ LMX health checks ■ I/O fencing health checks ■ PrivNIC health checks ■ Oracle Clusterware health checks ■ CVM, CFS, and ODM health checks LLT health checks This section lists the health checks performed for LLT, the messages displayed for each check, and a brief description of the check. Note: Warning messages indicate issues in the components or the general health of the cluster.
PAGE 264
268 List of SF Oracle RAC health checks LLT health checks Table A-1 List of health checks for LLT List of health checks Message Description LLT timer subsystem scheduling check Warning: OS timer is not Checks whether the LLT module runs in accordance with the called for num seconds scheduled operating system timer. The message indicates that the operating system timer is not called for the specified interval. The parameter timer_threshold contains the optimum threshold for this check.
PAGE 265
List of SF Oracle RAC health checks LLT health checks Table A-1 List of health checks for LLT (continued) List of health checks Message Description Flaky link monitoring Warning: Connectivity with node node id on link link id is flaky num time(s). Checks whether the private interconnects are stable. The message indicates that connectivity (link) with the peer node (node id) is monitored (num) times within a stipulated duration (for example, 0-4 seconds).
PAGE 266
270 List of SF Oracle RAC health checks LLT health checks Table A-1 List of health checks List of health checks for LLT (continued) Message Description LLT packet related Retransmitted % checks percentage of total transmitted packets. Checks whether the data packets transmitted by LLT reach peer nodes without any error. If there is an error in packet transmission, it Sent % percentage of indicates an error in the private total transmitted packet interconnects. when no link is up.
PAGE 267
List of SF Oracle RAC health checks LMX health checks Table A-1 List of health checks List of health checks for LLT (continued) Message Traffic distribution Traffic distribution over the links over links: %% Send data on linknum percentage %% Recv data on linknum percentage Description Checks the distribution of traffic over all the links configured under LLT. The message displays the percentage of data (%) sent and recd on a particular link (num) This check does not require a threshold.
PAGE 268
272 List of SF Oracle RAC health checks I/O fencing health checks Table A-2 List of health checks for LMX List of health checks Message Description Checks related to the working status of LMX/VCSMM driver LMX is not running. This warning is not applicable for Oracle 11g running cluster. Checks whether VCSMM/LMX is running. This warning is valid only for clusters running Oracle RAC 10g. VCSMM is not running.
PAGE 269
List of SF Oracle RAC health checks I/O fencing health checks Note: Warning messages indicate issues in the components or the general health of the cluster. For recommendations on resolving the issues, see the troubleshooting chapter in this document. Table A-3 lists the health checks performed for I/O fencing. Table A-3 List of health checks List of health checks for I/O fencing Message Description Checks related to VxFEN is not configured the working status in SCSI3 mode.
PAGE 270
274 List of SF Oracle RAC health checks PrivNIC health checks Table A-3 List of health checks for I/O fencing (continued) List of health checks Message Description Verify SCSI3 write-exclusive reservation on shared disk SCSI3 write-exclusive reservation is missing on shared disk (disk_name) Checks if the shared disk is accessible to the node for write operations.
PAGE 271
List of SF Oracle RAC health checks Oracle Clusterware health checks Table A-4 List of health checks for PrivNIC (continued) List of health checks Message Description Compare the NICs used by PrivNIC with the NICs configured under LLT For PrivNIC resources: Mismatch between LLT links llt nics and PrivNIC links private nics. Checks whether the NICs used by the PrivNIC resource have the same interface (private nics) as those configured as LLT links (llt nics).
PAGE 272
276 List of SF Oracle RAC health checks CVM, CFS, and ODM health checks Table A-5 List of health checks for the Oracle Clusterware module (continued) List of health checks Message Description Compare the NICs used by CRS with the NICs configured under LLT Mismatch between LLT links llt_nic1, llt_nic2 and Oracle Clusterware links crs_nic1, crs_nic2. Checks whether the private interconnects used by Oracle Clusterware are the same as the LLT links (llt nics).
PAGE 273
List of SF Oracle RAC health checks CVM, CFS, and ODM health checks Table A-6 List of health checks for CVM, CFS, and ODM (continued) List of health checks Message Description Verify CFS status CFS is not running Checks whether CFS is running in the cluster. Verify ODM status ODM is not running Checks whether ODM is running in the cluster.
PAGE 274
278 List of SF Oracle RAC health checks CVM, CFS, and ODM health checks
PAGE 275
Appendix B Error messages This appendix includes the following topics: ■ About error messages ■ LMX error messages ■ VxVM error messages ■ VXFEN driver error messages About error messages Error messages can be generated by the following software modules: ■ LLT Multiplexer (LMX) ■ Veritas Volume Manager (VxVM) ■ Veritas Fencing (VXFEN) driver LMX error messages There are two types of LMX error messages: critical and non-critical.
PAGE 276
280 Error messages LMX error messages Table B-1 LMX critical error messages Message ID LMX Message 00001 lmxload packet header size incorrect (number) 00002 lmxload invalid lmx_llt_port number 00003 lmxload context memory alloc failed 00004 lmxload port memory alloc failed 00005 lmxload buffer memory alloc failed 00006 lmxload node memory alloc failed 00007 lmxload msgbuf memory alloc failed 00008 lmxload tmp msgbuf memory alloc failed 00009 lmxunload node number conngrp not NULL 000
PAGE 277
Error messages LMX error messages # lmxconfig -e 0 To re-enable message displays, type: # lmxconfig -e 1 Table B-2 contains LMX error messages that may appear during run-time.
PAGE 278
282 Error messages VxVM error messages VxVM error messages Table B-3 contains VxVM error messages that are related to I/O fencing. Table B-3 VxVM error messages for I/O fencing Message Explanation vold_pgr_register(disk_path): failed to open The vxfen driver is not configured. Follow the vxfen device. Please make sure that the the instructions to set up these disks and vxfen driver is installed and configured. start I/O fencing.
PAGE 279
Error messages VXFEN driver error messages VXFEN driver informational message The following informational message appears when a node is ejected from the cluster to prevent data corruption when a split-brain occurs. VXFEN CRITICAL V-11-1-20 Local cluster node ejected from cluster to prevent potential data corruption Node ejection informational messages Informational messages may appear on the console of one of the cluster nodes when a node is ejected from a disk or LUN.
PAGE 280
284 Error messages VXFEN driver error messages
PAGE 281
Glossary Agent A process that starts, stops, and monitors all configured resources of a type, and reports their status to VCS. Authentication Broker The Veritas Security Services component that serves, one level beneath the root broker, as an intermediate registration authority and a certification authority. The authentication broker can authenticate clients, such as users or services, and grant them a certificate that will become part of the Veritas credential.
PAGE 282
286 Glossary classes when they meet specified naming, timing, access rate, and storage capacity-related conditions. See also Veritas File System (VxFS) Failover A failover occurs when a service group faults and is migrated to another system. GAB (Group Atomic A communication mechanism of the VCS engine that manages cluster membership, monitors heartbeat communication, and distributes information throughout the cluster.
PAGE 283
Glossary mirroring A form of storage redundancy in which two or more identical copies of data are maintained on separate volumes. (Each duplicate copy is known as a mirror.) Also RAID Level 1. Node The physical host or system on which applications and service groups reside. When systems are linked by VCS, they become nodes in a cluster. resources Individual components that work together to provide application services to the public network.
PAGE 284
288 Glossary State The current activity status of a resource, group or system. Resource states are given relative to both systems. Storage Checkpoint A facility that provides a consistent and stable view of a file system or database image and keeps track of modified data blocks since the last Storage Checkpoint. System The physical system on which applications and service groups reside. When a system is linked by VCS, it becomes a node in a cluster. See Node types.
PAGE 285
Index A agents intelligent resource monitoring 36 poll-based resource monitoring 36 AMF driver 36 C Changing the CVM master 151 cluster Group Membership Services/Atomic Broadcast (GAB) 28 interconnect communication channel 25 Cluster File System (CFS) architecture 32 communication 33 overview 32 Cluster master node changing 151 Cluster Volume Manager (CVM) architecture 30 communication 31 overview 30 commands format (verify disks) 221 vxdctl enable (scan disks) 221 communication communication stack 24 dat
PAGE 286
290 Index GAB tunable parameters (continued) dynamic (continued) Keep on killing 246 Kill_ntries 246 Missed heartbeat halt 246 Quorum flag 246 Stable timeout 246 static 244 flowctrl 244 gab_conn_wait 244 gab_kill_ntries 244 gab_kstat_size 244 isolate_time 244 logbufsize 244 msglogsize 244 numnids 244 numports 244 getcomms troubleshooting 166 getdbac troubleshooting script 166 H hagetcf (troubleshooting script) 166 log files 210 M MANPATH environment variable 74 Master node changing 151 messages LMX err
PAGE 287
Index SFRAC tunable parameters 243 Switching the CVM master 151 T troubleshooting CVMVolDg 223 error when starting Oracle instance 227 File System Configured Incorrectly for ODM 234 getcomms 166 troubleshooting script 166 getdbac 166 hagetcf 166 Oracle log files 227 overview of topics 220, 224, 234 restoring communication after cable disconnection 221 running scripts for analysis 166 scripts 166 SCSI reservation errors during bootup 201 shared disk group cannot be imported 221 V VCSIPC errors in Oracle t