vSphere Availability 17 APR 2018 VMware vSphere 6.7 VMware ESXi 6.7 vCenter Server 6.
vSphere Availability You can find the most up-to-date technical documentation on the VMware website at: https://docs.vmware.com/ If you have comments about this documentation, submit your feedback to docfeedback@vmware.com VMware, Inc. 3401 Hillview Ave. Palo Alto, CA 94304 www.vmware.com Copyright © 2009–2018 VMware, Inc. All rights reserved. Copyright and trademark information. VMware, Inc.
Contents About vSphere Availability 5 1 Business Continuity and Minimizing Downtime 6 Reducing Planned Downtime 6 Preventing Unplanned Downtime 7 vSphere HA Provides Rapid Recovery from Outages 7 vSphere Fault Tolerance Provides Continuous Availability 8 Protecting the vCenter Server Appliance with vCenter High Availability Protecting vCenter Server with VMware Service Lifecycle Manager 9 10 2 Creating and Using vSphere HA Clusters 11 How vSphere HA Works 11 vSphere HA Admission Control vSphe
vSphere Availability 5 Using Microsoft Clustering Service for vCenter Server on Windows High Availability 95 Benefits and Limitations of Using MSCS 95 Upgrade vCenter Server in an MSCS Environment Configure MSCS for High Availability VMware, Inc.
About vSphere Availability vSphere Availability describes solutions that provide business continuity, including how to establish ® vSphere High Availability (HA) and vSphere Fault Tolerance. Intended Audience This information is for anyone who wants to provide business continuity through the vSphere HA and Fault Tolerance solutions. The information in this book is for experienced Windows or Linux system administrators who are familiar with virtual machine technology and data center operations.
Business Continuity and Minimizing Downtime 1 Downtime, whether planned or unplanned, brings considerable costs. However, solutions that ensure higher levels of availability have traditionally been costly, hard to implement, and difficult to manage. VMware software makes it simpler and less expensive to provide higher levels of availability for important applications.
vSphere Availability vSphere makes it possible for organizations to dramatically reduce planned downtime. Because workloads in a vSphere environment can be dynamically moved to different physical servers without downtime or service interruption, server maintenance can be performed without requiring application and service downtime. With vSphere, organizations can: n Eliminate downtime for common maintenance operations. n Eliminate planned maintenance windows.
vSphere Availability n It protects against datastore accessibility failures by restarting affected virtual machines on other hosts which still have access to their datastores. n It protects virtual machines against network isolation by restarting them if their host becomes isolated on the management or vSAN network. This protection is provided even if the network has become partitioned.
vSphere Availability If either the host running the Primary VM or the host running the Secondary VM fails, an immediate and transparent failover occurs. The functioning ESXi host seamlessly becomes the Primary VM host without losing network connections or in-progress transactions. With transparent failover, there is no data loss and network connections are maintained. After a transparent failover occurs, a new Secondary VM is respawned and redundancy is re-established.
vSphere Availability Option Description Basic The Basic option clones the Active node to the Passive node and witness node, and configures the nodes for you. If your environment meets one the following requirements, you can use this option. n Either the vCenter Server Appliance that becomes the Active node is managing its own ESXi host and its own virtual machine. This configuration is sometimes called a self-managed vCenter Server.
Creating and Using vSphere HA Clusters 2 vSphere HA clusters enable a collection of ESXi hosts to work together so that, as a group, they provide higher levels of availability for virtual machines than each ESXi host can provide individually. When you plan the creation and usage of a new vSphere HA cluster, the options you select affect the way that cluster responds to failures of hosts or virtual machines.
vSphere Availability When you create a vSphere HA cluster, a single host is automatically elected as the master host. The master host communicates with vCenter Server and monitors the state of all protected virtual machines and of the slave hosts. Different types of host failures are possible, and the master host must detect and appropriately deal with the failure. The master host must distinguish between a failed host and one that is in a network partition or that has become network isolated.
vSphere Availability Host Failure Types ® The master host of a VMware vSphere High Availability cluster is responsible for detecting the failure of subordinate hosts. Depending on the type of failure detected, the virtual machines running on the hosts might need to be failed over. In a vSphere HA cluster, three types of host failure are detected: n Failure. A host stops functioning. n Isolation. A host becomes network isolated. n Partition. A host loses network connectivity with the master host.
vSphere Availability If a Proactive HA failure occurs, you can automate the remediation action taken in the vSphere Availability section of the vSphere Client. The VMs on the affected host can be evacuated to other hosts and the host is either placed in Quarantine mode or Maintenance mode. Note Your cluster must use vSphere DRS for the Proactive HA failure monitoring to work.
vSphere Availability A virtual machine "split-brain" condition can occur when a host becomes isolated or partitioned from a master host and the master host cannot communicate with it using heartbeat datastores. In this situation, the master host cannot determine that the host is alive and so declares it dead. The master host then attempts to restart the virtual machines that are running on the isolated or partitioned host.
vSphere Availability Host limits In addition to resource reservations, a virtual machine can only be placed on a host if doing so does not violate the maximum number of allowed virtual machines or the number of in-use vCPUs. Feature constraints If the advanced option has been set that requires vSphere HA to enforce VM to VM anti-affinity rules, vSphere HA does not violate this rule. Also, vSphere HA does not violate any configured per host limits for fault tolerant virtual machines.
vSphere Availability You can also specify custom values for both monitoring sensitivity and the I/O stats interval by selecting the Custom checkbox. Table 2‑1. VM Monitoring Settings Setting Failure Interval (seconds) Reset Period High 30 1 hour Medium 60 24 hours Low 120 7 days After failures are detected, vSphere HA resets virtual machines. The reset ensures that services remain available.
vSphere Availability Configuring VMCP VM Component Protection is configured in the vSphere Client. Go to the Configure tab and click vSphere Availability and Edit. Under Failures and Responses you can select Datastore with PDL or Datastore with APD. The storage protection levels you can choose and the virtual machine remediation actions available differ depending on the type of database accessibility failure. PDL Failures Under Datastore with PDL, you can select Issue events or Power off and restart VMs.
vSphere Availability ® VMware vCenter Server selects a preferred set of datastores for heartbeating. This selection is made to maximize the number of hosts that have access to a heartbeating datastore and minimize the likelihood that the datastores are backed by the same LUN or NFS server. You can use the advanced option das.heartbeatdsperhost to change the number of heartbeat datastores selected by vCenter Server for each host. The default is two and the maximum valid value is five.
vSphere Availability n For legacy ESXi 4.x hosts, vSphere HA writes to /var/log/vmware/fdm on local disk, as well as syslog if it is configured. n Secure vSphere HA logins For legacy ESX 4.x hosts, vSphere HA writes to /var/log/vmware/fdm. vSphere HA logs onto the vSphere HA agents using a user account, vpxuser, created by vCenter Server. This account is the same account used by vCenter Server to manage the host.
vSphere Availability The basis for vSphere HA admission control is how many host failures your cluster is allowed to tolerate and still guarantee failover. The host failover capacity can be set in three ways: n Cluster resource percentage n Slot policy n Dedicated failover hosts Note vSphere HA admission control can be disabled. However, without it you have no assurance that the expected number of virtual machines can be restarted after a failure. Do not permanently disable admission control.
vSphere Availability 4 Determines if either the Current CPU Failover Capacity or Current Memory Failover Capacity is less than the corresponding Configured Failover Capacity (provided by the user). If so, admission control disallows the operation. vSphere HA uses the actual reservations of the virtual machines. If a virtual machine does not have reservations, meaning that the reservation is 0, a default of 0MB memory and 32MHz CPU is applied.
vSphere Availability Figure 2‑1. Admission Control Example with Percentage of Cluster Resources Reserved Policy VM1 2GHz 1GB VM2 2GHz 1GB VM3 1GHz 2GB VM4 1GHz 1GB VM5 1GHz 1GB total resource requirements 7GHz, 6GB H1 H2 H3 9GHz 9GB 9GHz 6GB 6GHz 6GB total host resources 24GHz, 21GB The total resource requirements for the powered-on virtual machines is 7GHz and 6GB. The total host resources available for virtual machines is 24GHz and 21GB.
vSphere Availability 4 Determines whether the Current Failover Capacity is less than the Configured Failover Capacity (provided by the user). If it is, admission control disallows the operation. Note You can set a specific slot size for both CPU and memory in the admission control section of the vSphere HA settings in the vSphere Client. Slot Size Calculation vSphere HA Slot Size and Admission Control (http://link.brightcove.
vSphere Availability The Current Failover Capacity is computed by determining how many hosts (starting from the largest) can fail and still leave enough slots to satisfy the requirements of all powered-on virtual machines. Example: Admission Control Using Slot Policy The way that slot size is calculated and used with this admission control policy is shown in an example.
vSphere Availability The largest host is H1 and if it fails, six slots remain in the cluster, which is sufficient for all five of the powered-on virtual machines. If both H1 and H2 fail, only three slots remain, which is insufficient. Therefore, the Current Failover Capacity is one. The cluster has one available slot (the six slots on H2 and H3 minus the five used slots). Dedicated Failover Hosts Admission Control You can configure vSphere HA to designate specific hosts as the failover hosts.
vSphere Availability Networking Differences vSAN has its own network. If vSAN and vSphere HA are enabled for the same cluster, the HA interagent traffic flows over this storage network rather than the management network. vSphere HA uses the management network only if vSAN is disabled. vCenter Server chooses the appropriate network if vSphere HA is configured on a host. Note You can enable vSAN only if vSphere HA is disabled.
vSphere Availability Using vSphere HA and DRS Together Using vSphere HA with Distributed Resource Scheduler (DRS) combines automatic failover with load balancing. This combination can result in a more balanced cluster after vSphere HA has moved virtual machines to different hosts. When vSphere HA performs failover and restarts virtual machines on different hosts, its first priority is the immediate availability of all virtual machines.
vSphere Availability n VM-Host affinity rules place specified virtual machines on a particular host or a member of a defined group of hosts during failover actions. When you edit a DRS affinity rule, you must use vSphere HA advanced options to enforce the desired failover behavior for vSphere HA. n HA must respect VM anti-affinity rules during failover -- When the advanced option for VM antiaffinity rules is set, vSphere HA does not fail over a virtual machine if doing so violates a rule.
vSphere Availability In addition to the previous restrictions, the following types of IPv6 address types are not supported for use with the vSphere HA isolation address or management network: link-local, ORCHID, and link-local with zone indices. Also, the loopback address type cannot be used for the management network. Note To upgrade an existing IPv4 deployment to IPv6, you must first disable vSphere HA.
vSphere Availability n To ensure that any virtual machine can run on any host in the cluster, all hosts must have access to the same virtual machine networks and datastores. Similarly, virtual machines must be located on shared, not local, storage otherwise they cannot be failed over in the case of a host failure. Note vSphere HA uses datastore heartbeating to distinguish between partitioned, isolated, and failed hosts.
vSphere Availability Procedure 1 In the vSphere Client, browse to the data center where you want the cluster to reside and click New Cluster. 2 Complete the New Cluster wizard. Do not turn on vSphere HA (or DRS). 3 Click OK to close the wizard and create an empty cluster. 4 Based on your plan for the resources and networking architecture of the cluster, use the vSphere Client to add hosts to the cluster. 5 Browse to the cluster and enable vSphere HA. 6 a Click the Configure tab.
vSphere Availability A vSphere HA-enabled cluster is a prerequisite for vSphere Fault Tolerance. Prerequisites n Verify that all virtual machines and their configuration files reside on shared storage. n Verify that the hosts are configured to access the shared storage so that you can power on the virtual machines by using different hosts in the cluster. n Verify that hosts are configured to have access to the virtual machine network.
vSphere Availability 8 Click OK. You have a vSphere HA cluster, populated with hosts. What to do next Configure the appropriate vSphere HA settings for your cluster. n Failures and responses n Proactive HA Failures and Responses n Admission Control n Heartbeat Datastores n Advanced Options See Configuring vSphere Availability Settings.
vSphere Availability 2 Respond to Host Isolation You can set specific responses to host isolation that occurs in your vSphere HA cluster. 3 Configure VMCP Responses Configure the response that VM Component Protection (VMCP) makes when a datastore encounters a PDL or APD failure. 4 Enable VM Monitoring You can turn on VM and Application Monitoring and also set the monitoring sensitivity for your vSphere HA cluster.
vSphere Availability Procedure 1 In the vSphere Client, browse to the vSphere HA cluster. 2 Click the Configure tab. 3 Select vSphere Availability and click Edit. 4 Click Failures and Responses and expand Response for Host Isolation. 5 To configure the host isolation response, select Disabled, Shut down and restart VMs, or Power off and restart VMs. 6 Click OK. Your setting for the host isolation response takes effect.
vSphere Availability 2 Click the Configure tab. 3 Select vSphere Availability and click Edit. 4 Click Failures and Responses and expand VM Monitoring. 5 Select VM Monitoring and Application Monitoring. These settings turn on VMware Tools heartbeats and application heartbeats, respectively. 6 To set the heartbeat monitoring sensitivity, move the slider between Low and High or select Custom to provide custom settings. 7 Click OK. Your monitoring settings take effect.
vSphere Availability 6 Select from the following configuration options. Option Description Automation Level Determine whether host quarantine or maintenance mode and VM migrations are recommendations or automatic. Remediation n Manual. vCenter Server suggests migration recommendations for virtual machines. n Automated. Virtual machines are migrated to healthy hosts and degraded hosts are entered into quarantine or maintenance mode depending on the configured Proactive HA automation level.
vSphere Availability 6 7 Select an option for Define host failover capacity by. Option Description Cluster resource percentage Specify a percentage of the cluster’s CPU and memory resources to reserve as spare capacity to support failovers. Slot Policy (powered-on VMs) Select a slot size policy that covers all powered on VMs or is a fixed size. You can also calculate how many VMs require multiple slots. Dedicated failover hosts Select hosts to use for failover actions.
vSphere Availability 6 In the Available heartbeat datastores pane, select the datastores that you want to use for heartbeating. The listed datastores are shared by more than one host in the vSphere HA cluster. When a datastore is selected, the lower pane displays all the hosts in the vSphere HA cluster that can access it. 7 Click OK. Set Advanced Options To customize vSphere HA behavior, set advanced vSphere HA options. Prerequisites Verify that you have cluster administrator privileges.
vSphere Availability Table 2‑4. vSphere HA Advanced Options Option Description das.isolationaddress[...] Sets the address to ping to determine if a host is isolated from the network. This address is pinged only when heartbeats are not received from any other host in the cluster. If not specified, the default gateway of the management network is used. This default gateway has to be a reliable address that is available, so that the host can determine if it is isolated from the network.
vSphere Availability Table 2‑4. vSphere HA Advanced Options (Continued) Option Description fdm.isolationpolicydelaysec The number of seconds system waits before executing the isolation policy once it is determined that a host is isolated. The minimum value is 30. If set to a value less than 30, the delay will be 30 seconds. das.respectvmvmantiaffinityrules Determines if vSphere HA enforces VM-VM anti-affinity rules. Default value is "true", whereby the rules are enforced.
vSphere Availability Table 2‑4. vSphere HA Advanced Options (Continued) Option Description das.reregisterrestartdisabledvms When vSphere HA is disabled on a specific VM this option ensures that the VM is registered on another host after a failure. This allows you to power-on that VM without needing to reregister it manually. Note When this option is used, vSphere HA does not power on the VM, but only registers it. das.
vSphere Availability 6 (Optional) You can change other settings, such as the Automation level, VM restart priority, Response for Host Isolation, VMCP settings,VM Monitoring, or VM monitoring sensitivity settings. Note You can view the cluster defaults for these settings by first expanding Relevant Cluster Settings and then expanding vSphere HA. 7 Click OK. The virtual machine’s behavior now differs from the cluster defaults for each setting that you changed.
vSphere Availability Networks Used for vSphere HA Communications To identify which network operations might disrupt the functioning of vSphere HA, you must know which management networks are being used for heart beating and other vSphere HA communications. n On legacy ESX hosts in the cluster, vSphere HA communications travel over all networks that are designated as service console networks. VMkernel networks are not used by these hosts for vSphere HA communications.
vSphere Availability The first way you can implement network redundancy is at the NIC level with NIC teaming. Using a team of two NICs connected to separate physical switches improves the reliability of a management network. Because servers connected through two NICs (and through separate switches) have two independent paths for sending and receiving heartbeats, the cluster is more resilient.
vSphere Availability Using Auto Deploy with vSphere HA You can use vSphere HA and Auto Deploy together to improve the availability of your virtual machines. Auto Deploy provisions hosts when they power-on and you can also configure it to install the vSphere HA agent on hosts during the boot process. See the Auto Deploy documentation included in vSphere Installation and Setup for details. Upgrading Hosts in a Cluster Using vSAN If you are upgrading the ESXi hosts in your vSphere HA cluster to version 5.
Providing Fault Tolerance for Virtual Machines 3 You can use vSphere Fault Tolerance for your virtual machines to ensure continuity with higher levels of availability and data protection. Fault Tolerance is built on the ESXi host platform, and it provides availability by having identical virtual machines run on separate hosts. To obtain the optimal results from Fault Tolerance you must be familiar with how it works, how to enable it for your cluster, virtual machines and the best practices for its usage.
vSphere Availability The Primary and Secondary VMs continuously monitor the status of one another to ensure that Fault Tolerance is maintained. A transparent failover occurs if the host running the Primary VM fails, in which case the Secondary VM is immediately activated to replace the Primary VM. A new Secondary VM is started and Fault Tolerance redundancy is reestablished automatically. If the host running the Secondary VM fails, it is also immediately replaced.
vSphere Availability availability of critical information. With vSphere Fault Tolerance, you can protect this virtual machine before running this report and then turn off or suspend Fault Tolerance after the report has been produced. You can use On-Demand Fault Tolerance to protect the virtual machine during a critical time period and return the resources to normal during non-critical operation.
vSphere Availability Fault Tolerance Interoperability Before configuring vSphere Fault Tolerance, you must be aware of the features and products Fault Tolerance cannot interoperate with. vSphere Features Not Supported with Fault Tolerance When configuring your cluster, you should be aware that not all vSphere features can interoperate with Fault Tolerance. The following vSphere features are not supported for fault tolerant virtual machines. n Snapshots.
vSphere Availability Table 3‑1. Features and Devices Incompatible with Fault Tolerance and Corrective Actions Incompatible Feature or Device Corrective Action Physical Raw Disk mapping (RDM). With legacy FT you can reconfigure virtual machines with physical RDM-backed virtual devices to use virtual RDMs instead. CD-ROM or floppy virtual devices backed by a physical or remote device. Remove the CD-ROM or floppy virtual device or reconfigure the backing with an ISO installed on shared storage.
vSphere Availability The tasks you should complete before attempting to set up Fault Tolerance for your cluster include the following: n Ensure that your cluster, hosts, and virtual machines meet the requirements outlined in the Fault Tolerance checklist. n Configure networking for each host. n Create the vSphere HA cluster, add hosts, and check compliance. After your cluster and hosts are prepared for Fault Tolerance, you are ready to turn on Fault Tolerance for your virtual machines.
vSphere Availability Virtual Machine Requirements for Fault Tolerance You must meet the following virtual machine requirements before you use Fault Tolerance. n No unsupported devices attached to the virtual machine. See Fault Tolerance Interoperability. n Incompatible features must not be running with the fault tolerant virtual machines. See Fault Tolerance Interoperability. n Virtual machine files (except for the VMDK files) must be stored on shared storage.
vSphere Availability 4 Click the Add Networking icon. 5 Provide appropriate information for your connection type. 6 Click Finish. After you create both a vMotion and Fault Tolerance logging virtual switch, you can create other virtual switches, as needed. Add the host to the cluster and complete any steps needed to turn on Fault Tolerance.
vSphere Availability Several validation checks are performed on a virtual machine before Fault Tolerance can be turned on. n SSL certificate checking must be enabled in the vCenter Server settings. n The host must be in a vSphere HA cluster or a mixed vSphere HA and DRS cluster. n The host must have ESXi 6.x or greater installed. n The virtual machine must not have snapshots. n The virtual machine must not be a template. n The virtual machine must not have vSphere HA disabled.
vSphere Availability After these checks are passed, the Primary and Secondary VMs are powered on and placed on separate, compatible hosts. The virtual machine's Fault Tolerance Status is tagged as Protected. Turn On Fault Tolerance You can turn on vSphere Fault Tolerance through the vSphere Client. When Fault Tolerance is turned on, vCenter Server resets the virtual machine's memory limit and sets the memory reservation to the memory size of the virtual machine.
vSphere Availability Use the Turn Off Fault Tolerance option if you do not plan to reenable the feature. Otherwise, use the Suspend Fault Tolerance option. Note If the Secondary VM resides on a host that is in maintenance mode, disconnected, or not responding, you cannot use the Turn Off Fault Tolerance option. In this case, you should suspend and resume Fault Tolerance instead. Procedure 1 In the vSphere Client, browse to the virtual machine for which you want to turn off Fault Tolerance.
vSphere Availability 2 Right-click the virtual machine and select Fault Tolerance > Migrate Secondary. 3 Complete the options in the Migrate dialog box and confirm the changes that you made. 4 Click Finish to apply the changes. The Secondary VM associated with the selected fault tolerant virtual machine is migrated to the specified host. Test Failover You can induce a failover situation for a selected Primary VM to test your Fault Tolerance protection.
vSphere Availability Verify that you have sets of four or more ESXi hosts that are hosting fault tolerant virtual machines that are powered on. If the virtual machines are powered off, the Primary and Secondary VMs can be relocated to hosts with different builds. Note This upgrade procedure is for a minimum four-node cluster. The same instructions can be followed for a smaller cluster, though the unprotected interval will be slightly longer.
vSphere Availability n Use deterministic teaming policies to ensure particular traffic types have an affinity to a particular NIC (active/standby) or set of NICs (for example, originating virtual port-id). n Where active/standby policies are used, pair traffic types to minimize impact in a failover situation where both traffic types will share a vmnic.
vSphere Availability In a partitioned vSphere HA cluster using Fault Tolerance, the Primary VM (or its Secondary VM) could end up in a partition managed by a master host that is not responsible for the virtual machine. When a failover is needed, a Secondary VM is restarted only if the Primary VM was in a partition managed by the master host responsible for it.
vSphere Availability Problem When you attempt to power on a virtual machine with Fault Tolerance enabled, an error message might appear if you did not enable HV. Cause This error is often the result of HV not being available on the ESXi server on which you are attempting to power on the virtual machine. HV might not be available either because it is not supported by the ESXi server hardware or because HV is not enabled in the BIOS.
vSphere Availability Problem When a Secondary VM resides on a host that is heavily loaded, the Secondary VM can affect the performance of the Primary VM. Cause A Secondary VM running on a host that is overcommitted (for example, with its CPU resources) might not get the same amount of resources as the Primary VM. When this occurs, the Primary VM must slow down to allow the Secondary VM to keep up, effectively reducing its execution speed to the slower speed of the Secondary VM.
vSphere Availability Some Hosts Are Overloaded with FT Virtual Machines You might encounter performance problems if your cluster's hosts have an imbalanced distribution of FT VMs. Problem Some hosts in the cluster might become overloaded with FT VMs, while other hosts might have unused resources. Cause vSphere DRS does not load balance FT VMs (unless they are using legacy FT). This limitation might result in a cluster where hosts are unevenly distributed with FT VMs.
vSphere Availability Solution When planning your FT deployment, place the metadata datastore on highly available storage. While FT is running, if you see that the access to the metadata datastore is lost on either the Primary VM or the Secondary VM, promptly address the storage problem before loss of access causes one of the previous problems. If a VM stops being recognized as an FT VM by vCenter Server, do not perform unsupported operations on the VM. Restore access to the metadata datastore.
vSphere Availability Solution If DRS does not place or evacuate FT VMs in the cluster, check the VMs for a VM override that is disabling DRS. If you find one, remove the override that is disabling DRS. Note For more information on how to edit or delete VM overrides, see vSphere Resource Management. Fault Tolerant Virtual Machine Failovers A Primary or Secondary VM can fail over even though its ESXi host has not crashed.
vSphere Availability Lack of File System Space Prevents Secondary VM Startup Check whether or not your /(root) or /vmfs/datasource file systems have available space. These file systems can become full for many reasons, and a lack of space might prevent you from being able to start a new Secondary VM. VMware, Inc.
vCenter High Availability 4 vCenter High Availability (vCenter HA) protects vCenter Server Appliance against host and hardware failures. The active-passive architecture of the solution can also help you reduce downtime significantly when you patch vCenter Server Appliance. After some network configuration, you create a three-node cluster that contains Active, Passive, and Witness nodes. Different configuration paths are available. What you select depends on your existing configuration.
vSphere Availability 6 Troubleshoot Your vCenter HA Environment In case of problems you can troubleshoot your environment. The task you need to perform depends on the failure symptoms. For additional troubleshooting information, see the VMware Knowledge Base system.
vSphere Availability Table 4‑1. vCenter HA Nodes Node Description Active n Runs the active vCenter Server Appliance instance n Uses a public IP address for the management interface n Uses the vCenter HA network for replication of data to the Passive node. n Uses the vCenter HA network to communicate with the Witness node.
vSphere Availability vCenter HA Deployment Options You can set up your vCenter HA environment with an embedded Platform Services Controller or with an external Platform Services Controller. If you decide to use an external Platform Services Controller, you can place it behind a load balancer for protection in case of Platform Services Controller failure.
vSphere Availability vCenter HA with an External Platform Services Controller When you use vCenter HA with an external Platform Services Controller, you must set up an external load balancer to protect the Platform Services Controller. If one Platform Services Controller becomes unavailable, the load balancer directs the vCenter Server Appliance to a different Platform Services Controller. Set up of the external Platform Services Controller is discussed in the following VMware Knowledge Base articles.
vSphere Availability 5 As part of the clone process, the information about the external Platform Services Controller and the load balancer is cloned as well. 6 When configuration is complete, the vCenter Server Appliance is protected by vCenter HA. 7 If the Platform Services Controller instance becomes unavailable, the load balancer redirects requests for authentication or other services to the second Platform Services Controller instance.
vSphere Availability Advanced Configuration Workflow If you cannot select the Basic option or you want more control over your deployment, you can perform Advanced configuration. With this option, you are responsible for cloning the Active node yourself as part of vCenter HA setup. If you select this option and remove the vCenter HA configuration later, you are responsible for deleting the nodes that you created. For the Advanced option, the workflow is as follows.
vSphere Availability n The vCenter HA network must be on a different subnet than the management network. The three nodes can be on the same subnet or on different subnets. n Network latency between the Active, Passive, and Witness nodes must be less than 10 milliseconds. n You must not add a default gateway entry for the cluster network. Prerequisites n The vCenter Server Appliance that later becomes the Active node, is deployed.
vSphere Availability Configure vCenter HA With the Basic Option When you use the Basic option, the vCenter HA wizard creates and configures a second network adapter on the vCenter Server Appliance, clones the Active node, and configures the vCenter HA network. Prerequisites n n Deploy vCenter Server Appliance that you want to use as the initial Active node. n The vCenter Server Appliance must have a static IP address. n SSH must be enabled on the vCenter Server Appliance.
vSphere Availability 8 Review the information for the Passive and Witness nodes, click Edit to make changes, and click Next. If you are not using a DRS cluster, select different hosts and datastores for the Passive and Witness nodes if possible. 9 Click Finish. The Passive and Witness nodes are created. When vCenter HA configuration is complete, vCenter Server Appliance has high availability protection. What to do next See Manage the vCenter HA Configuration for a list of cluster management tasks.
vSphere Availability Prerequisites n Set up the infrastructure for the vCenter HA network. See Configure the Network. n Deploy vCenter Server Appliance that you want to use as initial Active node. n The vCenter Server Appliance must have a static IP address mapped to an FQDN. n SSH must be enabled on the vCenter Server Appliance. Procedure 1 Log in to the management vCenter Server with the vSphere Web Client.
vSphere Availability 5 Provide the IP address and subnet mask for the Passive and Witness nodes click Next. You have to specify these IP addresses now even though the nodes do not exist yet. You can no longer change these IP addresses after you click Next. 6 (Optional) Click Advanced if you want to override the failover management IP address for the Passive node. 7 Leave the wizard window open and perform the cloning tasks. What to do next Create and Configure the Clones of the Active Node.
vSphere Availability 3 After the first clone has been created, clone the Active node again for the Witness node. Option Value New Virtual Machine Name Name of the Witness node. For example, use vcsa-witness. Select Compute Resource Use a different target host and datastore than for the Active and Passive nodes if possible.
vSphere Availability n Set Up Your Environment to Use Custom Certificates The machine SSL certificate on each node is used for cluster management communication and for encryption of replication traffic. If you want to use custom certificates, you have to remove the vCenter HA configuration, delete the Passive and Witness nodes, provision the Active node with the custom certificate, and reconfigure the cluster.
vSphere Availability Set up SNMP traps for the Active node and the Passive node. You tell the agent where to send related traps, by adding a target entry to the snmpd configuration. Procedure 1 Log in to the Active node by using the Virtual Machine Console or SSH. 2 Run the vicfg-snmp command, for example: vicfg-snmp -t 10.160.1.1@1166/public In this example, 10.160.1.1 is the client listening address, 1166 is the client listening port, and public is the community string.
vSphere Availability Manage vCenter HA SSH Keys vCenter HA uses SSH keys for password-less authentication between the Active, Passive, and Witness nodes. The authentication is used for heartbeat exchange and file and data replication. To replace the SSH keys in the nodes of a vCenter HA cluster, you disable the cluster, generate new SSH keys on the Active node, transfer the keys to the passive node, and enable the cluster. Procedure 1 Edit the cluster and change the mode to Disabled.
vSphere Availability 4 After the failover, you can verify that the Passive node has the role of the Active node in the vSphere Web Client. Edit the vCenter HA Cluster Configuration When you edit the vCenter HA cluster configuration, you can disable or enable the cluster, place the cluster in maintenance mode, or remove the cluster. The operating mode of a vCenter Server Appliance controls the failover capabilities and state replication in a vCenter HA cluster.
vSphere Availability 3 4 Select one of the options. Option Result Enable vCenter HA Enables replication between the Active and Passive nodes. If the cluster is in a healthy state, your Active node is protected by automatic failover from the Passive node. Maintenance Mode In maintenance mode, replication still occurs between the Active and Passive nodes. However, automatic failover is disabled. Disable vCenter HA Disables replication and failover. Keeps the configuration of the cluster.
vSphere Availability Procedure 1 Log in to the Active node vCenter Server Appliance and click Configure. 2 Under Settings select vCenter HA and click Edit. 3 Select Remove vCenter HA cluster. n The vCenter HA cluster's configuration is removed from the Active, Passive, and Witness nodes. n The Active node continues to run as a standalone vCenter Server Appliance. n You cannot reuse the Passive and Witness nodes in a new vCenter HA configuration.
vSphere Availability 3 Change the vCenter Server Appliance configuration for the Active node, for example, from a Small environment to a Medium environment. 4 Reconfigure vCenter HA. Collecting Support Bundles for a vCenter HA Node Collecting a support bundle from all the nodes in a vCenter HA cluster helps with troubleshooting. When you collect a support bundle from the Active node in a vCenter HA cluster, the system proceeds as follows.
vSphere Availability vCenter HA Clone Operation Fails During Deployment If the vCenter HA configuration process does not create the clones successfully, you have to resolve that cloning error. Problem Clone operation fails. Note Cloning a Passive or Witness VM for a VCHA deployment to the same NFS 3.1 datastore as the source Active node VM fails. You must use NFS4 or clone the Passive and Witness VMs to a datastore different from the Active VM. Cause Look for the clone exception.
vSphere Availability Troubleshooting a Degraded vCenter HA Cluster For a vCenter HA cluster to be healthy, each of the Active, Passive, and Witness nodes must be fully operational and be reachable over the vCenter HA cluster network. If any of the nodes fails, the cluster is considered to be in a degraded state. Problem If the cluster is in a degraded state, failover cannot occur. For information about failure scenarios while the cluster is in a degraded state, see Resolving Failover Failures.
vSphere Availability Solution How you recover depends on the cause of the degraded cluster state. If the cluster is in a degraded state, events, alarms, and SNMP traps show errors. If one of the nodes is down, check for hardware failure or network isolation. Check whether the failed node is powered on. In case of replication failures, check if the vCenter HA network has sufficient bandwidth and ensure network latency is 10 ms or less.
vSphere Availability Cause A vCenter HA failover might not succeed for these reasons. n The Witness node becomes unavailable while the Passive node is trying to assume the role of the Active node. n An appliance state synchronization issue between the nodes exists. Solution You recover from this issue as follows. 1 If the Active node recovers from the failure, it becomes the Active node again. 2 If the Witness node recovers from the failure, follow these steps.
vSphere Availability Table 4‑4. The following events will raise VCHA health alarm in vpxd: (Continued) Event Name Event Description Event Type Category vCenter HA cluster state is currently isolated vCenter HA cluster state is currently isolated com.vmware.vcha.cluster.stat e.isolated error vCenter HA cluster is destroyed vCenter HA cluster is destroyed com.vmware.vcha.cluster.stat e.destroyed info Table 4‑5.
vSphere Availability Table 4‑8. File replication-related events Event Name Event Description Event Type Category Appliance {fileProviderType} is {state} Appliance File replication state changed com.vmware.vcha.file.replicati on.state.changed info Patching a vCenter High Availability Environment You can patch a vCenter Server Appliance which is in a vCenter High Availability cluster by using the software-packages utility available in the vCenter Server Appliance shell.
Using Microsoft Clustering Service for vCenter Server on Windows High Availability 5 When you deploy vCenter Server, you must build a highly available architecture that can handle workloads of all sizes. Availability is critical for solutions that require continuous connectivity to vCenter Server. To avoid extended periods of downtime, you can achieve continuous connectivity for vCenter Server by using a Microsoft Cluster Service (MSCS) cluster.
vSphere Availability Upgrade vCenter Server in an MSCS Environment If you are running vCenter Server 6.0, you must upgrade to vCenter Server 6.5 to set up an MSCS high availability environment. vCenter Server 6.0.x has 18 services, assuming that the PSC server is running on a different host. vCenter Server 6.5 has 3 services and the names have changed. An MSCS cluster configuration created to set up high availability for vCenter Server 6.0 becomes invalid after an upgrade to vCenter Server 6.5.
vSphere Availability 8 Set up the MSCS cluster configuration again and set the startup type of all vCenter Server services to manual. 9 Shut down the primary node and detach the RDM disks, but do not delete them from the datastore. 10 After the reconfiguration is complete, select VM > Clone > Clone to Template, clone the secondary node, and change its IP and host name. 11 Keep the secondary node powered off and add both RDM disks to the primary node.
vSphere Availability Figure 5‑1. MSCS Cluster for vCenter Server High Availability vCenter Server Infrastructure Node (VM1) N1 vCenter Server Management Node (VM1) M1 vCenter Server Infrastructure Node (VM 2) N2 vCenter Server Management Node (VM2) M1 MSCS Cluster SQL Server DB (VM1) Node1 SQL Server DB (VM2) Node2 MSCS Cluster Note MSCS as an availability solution for vCenter Server is provided only for management nodes of vCenter Server (M node).
vSphere Availability 9 Change the host name and IP address on the first VM (VM1). Note the original IP address and host name that were used at the time of the installation of vCenter Server on VM1. This information is used to assign a cluster role IP. 10 Install failover clustering on both nodes. 11 To create an MSCS cluster on VM1, include both nodes in the cluster. Also select the validation option for the new cluster. 12 To start configuring roles, select Generic Service and click Next.