6.5.1

ManualsBrandsVMware ManualsApplicationsvSphere

Table Of Contents

vSphere Troubleshooting

vSphere Troubleshooting

Update 1

Modiﬁed on 04 OCT 2017

VMware vSphere 6.5

VMware ESXi 6.5

vCenter Server 6.5

Summary of content (105 pages)

PAGE 1
vSphere Troubleshooting Update 1 Modified on 04 OCT 2017 VMware vSphere 6.5 VMware ESXi 6.5 vCenter Server 6.
PAGE 2
vSphere Troubleshooting You can find the most up-to-date technical documentation on the VMware website at: https://docs.vmware.com/ If you have comments about this documentation, submit your feedback to docfeedback@vmware.com VMware, Inc. 3401 Hillview Ave. Palo Alto, CA 94304 www.vmware.com Copyright © 2010–2017 VMware, Inc. All rights reserved. Copyright and trademark information. VMware, Inc.
PAGE 3
Contents About vSphere Troubleshooting Updated Information 5 6 1 Troubleshooting Overview 7 Guidelines for Troubleshooting Troubleshooting with Logs 7 9 2 Troubleshooting Virtual Machines 11 Troubleshooting Fault Tolerant Virtual Machines Troubleshooting USB Passthrough Devices Recover Orphaned Virtual Machines 11 16 18 Virtual Machine Does Not Power On After Cloning or Deploying from Template 19 3 Troubleshooting Hosts 21 Troubleshooting vSphere HA Host States Troubleshooting vSphere Auto Deplo
PAGE 4
vSphere Troubleshooting 7 Troubleshooting Storage 61 Resolving SAN Storage Display Problems Resolving SAN Performance Problems 61 63 Virtual Machines with RDMs Need to Ignore SCSI INQUIRY Cache Software iSCSI Adapter Is Enabled When Not Needed Failure to Mount NFS Datastores 69 69 Troubleshooting Storage Adapters 70 Checking Metadata Consistency with VOMA 70 No Failover for Storage Path When TUR Command Is Unsuccessful Troubleshooting Flash Devices 72 74 Troubleshooting Virtual Volumes Trouble
PAGE 5
About vSphere Troubleshooting vSphere Troubleshooting describes troubleshooting issues and procedures for VMware vCenter Server implementations and related components. ® Intended Audience This information is for anyone who wants to troubleshoot virtual machines, ESXi hosts, clusters, and related storage solutions. The information in this book is for experienced Windows or Linux system administrators who are familiar with virtual machine technology and data center operations.
PAGE 6
Updated Information This vSphere Troubleshooting is updated with each release of the product or when necessary. This table provides the update history of the vSphere Troubleshooting. Revision Description 04 OCT 2017 n Updated log information in vSphere Auto Deploy TFTP Timeout Error at Boot Time. n Updated log directories in Troubleshooting with Logs. EN-002608-00 VMware, Inc. Initial release.
PAGE 7
Troubleshooting Overview 1 vSphere Troubleshooting contains common troubleshooting scenarios and provides solutions for each of these problems. You can also find guidance here for resolving problems that have similar origins. For unique problems, consider developing and adopting a troubleshooting methodology. The following approach for effective troubleshooting elaborates on how to gather troubleshooting information, such as identifying symptoms and defining the problem space.
PAGE 8
vSphere Troubleshooting The first step in the troubleshooting process is to gather information that defines the specific symptoms of what is happening.
PAGE 9
vSphere Troubleshooting n Develop and pursue a hierarchy of potential solutions based on likelihood. Systematically eliminate each potential problem from the most likely to the least likely until the symptoms disappear. n When testing potential solutions, change only one thing at a time. If your setup works after many things are changed at once, you might not be able to discern which of those things made a difference.
PAGE 10
vSphere Troubleshooting Table 1‑2.
PAGE 11
Troubleshooting Virtual Machines 2 The virtual machine troubleshooting topics provide solutions to potential problems that you might encounter when using your virtual machines.
PAGE 12
vSphere Troubleshooting Solution If the ESXi server hardware supports HV, but HV is not currently enabled, enable HV in the BIOS on that server. The process for enabling HV varies among BIOSes. See the documentation for your hosts' BIOSes for details on how to enable HV. If the ESXi server hardware does not support HV, switch to hardware that uses processors that support Fault Tolerance.
PAGE 13
vSphere Troubleshooting Solution If the Secondary VM is on an overcommitted host, you can move the VM to another location without resource contention problems. Or more specifically, do the following: n For FT networking contention, use vMotion technology to move the Secondary VM to a host with fewer FT VMs contending on the FT network. Verify that the quality of the storage access to the VM is not asymmetric. n For storage contention problems, turn FT off and on again.
PAGE 14
vSphere Troubleshooting Cause vSphere DRS does not load balance FT VMs (unless they are using legacy FT). This limitation might result in a cluster where hosts are unevenly distributed with FT VMs. Solution Manually rebalance the FT VMs across the cluster by using vSphere vMotion. Generally, the fewer FT VMs that are on a host, the better they perform, due to reduced contention for FT network bandwidth and CPU resources.
PAGE 15
vSphere Troubleshooting Problem When you select Turn On Fault Tolerance for a powered-on VM, the operation fails and you see an Unknown error message. Cause This operation can fail if the host that the VM is running on has insufficient memory resources to provide fault tolerant protection. vSphere Fault Tolerance automatically tries to allocate a full memory reservation on the host for the VM. Overhead memory is required for fault tolerant VMs and can sometimes expand to 1 to 2 GB.
PAGE 16
vSphere Troubleshooting Partial Hardware Failure Related to Storage This problem can arise when access to storage is slow or down for one of the hosts. When this occurs there are many storage errors listed in the VMkernel log. To resolve this problem you must address your storage-related problems.
PAGE 17
vSphere Troubleshooting Error Message When You Try to Migrate Virtual Machine with USB Devices Attached Migration with vMotion cannot proceed and issues a confusing error message when you connect multiple USB devices from an ESXi host to a virtual machine and one or more devices are not enabled for vMotion. Problem The Migrate Virtual Machine wizard runs a compatibility check before a migration operation begins.
PAGE 18
vSphere Troubleshooting 3 After you reconnect the device, restart the usbarbitrator service:/etc/init.d/usbarbitrator start 4 Restart hostd and any running virtual machines to restore access to the passthrough devices in the virtual machine. What to do next Reconnect the USB devices to the virtual machine. Recover Orphaned Virtual Machines Virtual machines appear with (orphaned) appended to their names.
PAGE 19
vSphere Troubleshooting Virtual Machine Does Not Power On After Cloning or Deploying from Template Virtual machines do not power on after you complete the clone or deploy from template workflow in the vSphere Web Client. Problem When you clone a virtual machine or deploy a virtual machine from a template, you might not be able to power on the virtual machine after creation. Cause The swap file size is not reserved when the virtual machine disks are created.
PAGE 20
vSphere Troubleshooting d Click Edit. Note If the host is part of a cluster that specifies that the virtual machine swap files are stored in the same directory as the virtual machine, you cannot click Edit. You must use the Cluster Settings dialog box to change the swap file location policy for the cluster. e Select Use a specific datastore and select a datastore from the list. f Click OK. VMware, Inc.
PAGE 21
Troubleshooting Hosts 3 The host troubleshooting topics provide solutions to potential problems that you might encounter when using your vCenter Servers and ESXi hosts.
PAGE 22
vSphere Troubleshooting Cause A vSphere HA agent can be in the Agent Unreachable state for several reasons. This condition most often indicates that a networking problem is preventing vCenter Server or the master host from contacting the agent on the host, or that all hosts in the cluster have failed.
PAGE 23
vSphere Troubleshooting vSphere HA Agent is in the Initialization Error State The vSphere HA agent on a host is in the Initialization Error state for a minute or more. User intervention is required to resolve this situation. Problem vSphere HA reports that an agent is in the Initialization Error state when the last attempt to configure vSphere HA for the host failed. vSphere HA does not monitor the virtual machines on such a host and might not restart them after a failure.
PAGE 24
vSphere Troubleshooting Problem vSphere HA reports that an agent is in the Uninitialization Error state when vCenter Server is unable to unconfigure the agent on the host during the Unconfigure HA task. An agent left in this state can interfere with the operation of the cluster. For example, the agent on the host might elect itself as master host and lock a datastore. Locking a datastore prevents the valid cluster master host from managing the virtual machines with configuration files on that datastore.
PAGE 25
vSphere Troubleshooting Problem While the virtual machines running on the host continue to be monitored by the master hosts that are responsible for them, vSphere HA's ability to restart the virtual machines after a failure is affected. First, each master host has access to a subset of the hosts, so less failover capacity is available to each host. Second, vSphere HA might be unable to restart a FT Secondary VM after a failure (see Primary VM Remains in the Need Secondary State).
PAGE 26
vSphere Troubleshooting Cause A host is network isolated if both of the following conditions are met: n Isolation addresses have been configured and the host is unable to ping them. n The vSphere HA agent on the host is unable to access any of the agents running on the other cluster hosts. Note If your vSphere HA cluster has vSAN enabled, a host is determined to be isolated if it cannot communicate with the other vSphere HA agents in the cluster and cannot reach the configured isolation addresses.
PAGE 27
vSphere Troubleshooting Problem A TFTP Timeout error message appears when a host provisioned with vSphere Auto Deploy boots. The text of the message depends on the BIOS. Cause The TFTP server is down or unreachable. Solution n Ensure that your TFTP service is running and reachable by the host that you are trying to boot. n To view the diagnostic logs for details on the present error, see your TFTP service documentation.
PAGE 28
vSphere Troubleshooting Solution u Correct the IP address of the vSphere Auto Deploy server in the tramp file, as explained in the vSphere Installation and Setup documentation. Package Warning Message When You Assign an Image Profile to a vSphere Auto Deploy Host When you run a vSphere PowerCLI cmdlet that assigns an image profile that is not vSphere Auto Deploy ready, a warning message appears.
PAGE 29
vSphere Troubleshooting Solution 1 Install ESXi Dump Collector on a system of your choice. ESXi Dump Collector is included with the vCenter Server installer. 2 Use ESXCLI to configure the host to use ESXi Dump Collector. esxcli conn_options system coredump network set IP-addr,port esxcli system coredump network set -e true 3 Use ESXCLI to disable local coredump partitions.
PAGE 30
vSphere Troubleshooting Solution You can assign an image profile to the host by running the Apply-EsxImageProfile cmdlet, or by creating the following rule: 1 Run the New-DeployRule cmdlet to create a rule that includes a pattern that matches the host with an image profile. 2 3 Run the Add-DeployRule cmdlet to add the rule to a ruleset. Run the Test-DeployRuleSetCompliance cmdlet and use the output of that cmdlet as the input to the Repair-DeployRuleSetCompliance cmdlet.
PAGE 31
vSphere Troubleshooting Solution 1 Log in to the system on which you installed the vSphere Auto Deploy server. 2 Check that the vSphere Auto Deploy server is running. 3 a Click Start > Settings > Control Panel > Administrative Tools. b Double-click Services to open the Services Management panel. c In the Services field, look for the VMware vSphere Auto Deploy Waiter service and restart the service if it is not running.
PAGE 32
vSphere Troubleshooting Solution 1 2 Check that the DHCP server service is running on the Windows system on which the DHCP server is set up to provision hosts. a Click Start > Settings > Control Panel > Administrative Tools. b Double-click Services to open the Services Management panel. c In the Services field, look for the DHCP server service and restart the service if it is not running.
PAGE 33
vSphere Troubleshooting Recovering from Database Corruption on the vSphere Auto Deploy Server In some situations, you might have a problem with the vSphere Auto Deploy database. The most efficient recovery option is to replace the existing database file with the most recent backup. Problem When you use vSphere Auto Deploy to provision the ESXi hosts in your environment, you might encounter a problem with the vSphere Auto Deploy database. Important This is a rare problem.
PAGE 34
vSphere Troubleshooting Authentication Token Manipulation Error Creating a password that does not meet the authentication requirements of the host causes an error. Problem When you create a password on the host, the following fault message appears: A general system error occurred: passwd: Authentication token manipulation error. The following message is included: Failed to set the password. It is possible that your password does not meet the complexity criteria set by the system.
PAGE 35
vSphere Troubleshooting Cause Active Directory requires the activeDirectoryAll firewall rule set. You must enable the rule set in the firewall configuration. If you omit this setting, the system adds the necessary firewall rules when the host joins the domain, but the host will be noncompliant because of the mismatch in firewall rules. The host will also be noncompliant if you remove it from the domain without disabling the Active Directory rule set.
PAGE 36
vSphere Troubleshooting 4 Edit the access permissions of the service.xml file to allow writes by running the chmod command. n To allow writes, run chmod 644/etc/vmware/firewall/service.xml. n To toggle the sticky bit flag, run chmod +t /etc/vmware/firewall/service.xml. 5 Open the service.xml file in a text editor. 6 Add a new rule to the service.xml file that enables the custom port for the vCenter Server reverse proxy .
PAGE 37
vSphere Troubleshooting 10 (Optional) If you want the firewall configuration to persist after a reboot of the ESXi host, copy the service.xml onto persistent storage and modify the local.sh file. a Copy the modified service.xml file onto persistent storage, for example /store/, or onto a VMFS volume, for example /vmfs/volumes/volume/. cp /etc/vmware/firewall/service.xml location_of_xml_file You can store a VMFS volume in a single location and copy it to multiple hosts. b Add the service.
PAGE 38
Troubleshooting vCenter Server and the vSphere Web Client 4 The vCenter Server and vSphere Web Client troubleshooting topics provide solutions to problems you might encounter when you set up and configure vCenter Server and the vSphere Web Client, including vCenter Single Sign-On.
PAGE 39
vSphere Troubleshooting 4 Reboot the vCenter Server machine before upgrading. This releases any locked files that are used by the Tomcat process, and enables the vCenter Server installer to stop the Tomcat service for the upgrade. Alternatively, you can restart the vCenter Server machine and restart the upgrade process, but select the option not to overwrite the vCenter Server data.
PAGE 40
vSphere Troubleshooting Cause In vSphere 5.1 and later, you log into the vSphere Web Client to view and manage multiple instances of vCenter Server. Any vCenter Server system on which you have permissions appears in the inventory, if the server is registered with the same Component Manager as the vSphere Web Client. Solution n Log in to the vSphere Web Client as a user with permissions on the vCenter Server system.
PAGE 41
vSphere Troubleshooting Solution u Edit the webclient.properties file to add the line html.console.port=port, where port is the new port number. The webclient.properties file is located in one of the following locations, depending on the operating system on the machine on which the vSphere Web Client is installed: Windows 2008 C:\ProgramData\VMware\vCenterServer\cfg\vsphere-client\ vCenter Server Appliance /etc/vmware/vsphere-client/ Troubleshooting vCenter Server and ESXi Host Certificates Certificat
PAGE 42
vSphere Troubleshooting Cannot Configure vSphere HA When Using Custom SSL Certificates After you install custom SSL certificates, attempts to enable vSphere High Availability (HA) fail. Problem When you attempt to enable vSphere HA on a host with custom SSL certificates installed, the following error message appears: vSphere HA cannot be configured on this host because its SSL thumbprint has not been verified.
PAGE 43
Troubleshooting Availability 5 The availability troubleshooting topics provide solutions to potential problems that you might encounter when using your hosts and datastores in vSphere HA clusters. You might get an error message when you try to use vSphere HA or vSphere FT. For information about these error messages, see the VMware knowledge base article at http://kb.vmware.com/kb/1033634.
PAGE 44
vSphere Troubleshooting Another possible cause of this problem is if your cluster contains any virtual machines that have much larger memory or CPU reservations than the others. The Host Failures Cluster Tolerates admission control policy is based on the calculation on a slot size consisting of two components, the CPU and memory reservations of a virtual machine.
PAGE 45
vSphere Troubleshooting Solution View the Advanced Runtime Info pane that appears in the vSphere HA section of the cluster's Monitor tab in the vSphere Web Client. This information pane shows the slot size and how many available slots there are in the cluster. If the slot size appears too high, click on the Resource Allocation tab of the cluster and sort the virtual machines by reservation to determine which have the largest CPU and memory reservations.
PAGE 46
vSphere Troubleshooting vCenter Server automatically selects a preferred set of datastores for heartbeating. This selection is made with the goal of maximizing the number of hosts that have access to a given datastore and minimizing the likelihood that the selected datastores are backed by the same storage array or NFS server. In most cases, this selection should not be changed.
PAGE 47
vSphere Troubleshooting Problem The operation to unmount or remove a datastore fails if the datastore has any opened files. For these user operations, the vSphere HA agent closes all the files that it has opened, for example, heartbeat files. If the agent is not reachable by vCenter Server or the agent cannot flush out pending I/Os to close the files, a The HA agent on host '{hostName}' failed to quiesce file activity on datastore '{dsName} fault is triggered.
PAGE 48
vSphere Troubleshooting In this situation, vCenter Server reports the vSphere HA host state for the cluster hosts as Agent Unreachable or Agent Uninitialized and reports a cluster configuration problem that a master host has not been found. n Multiple master hosts exist and the one with which vCenter Server is communicating is not responsible for the virtual machine.
PAGE 49
vSphere Troubleshooting n vSphere HA attempted to restart the virtual machine but encountered a fatal error each time it tried. n Your cluster's shared storage is vSAN and one of the virtual machine's files has become inaccessible due to the occurrence of more than the specified number of host failures. n Restart actually succeeded. Solution To avoid virtual machine restart failures, check that virtual machines become protected by vSphere HA after they are powered on.
PAGE 50
vSphere Troubleshooting Cause To restart a Secondary VM, vSphere HA requires that the Primary VM be running on a host that is in the same partition as the one containing the vSphere HA master host responsible for the FT pair. In addition, the vSphere HA agent on the Primary VM’s host must be operating correctly. If these conditions are met, FT also requires that there be at least one other host in the same partition that is compatible with the FT pair and that has a functioning vSphere HA agent.
PAGE 51
vSphere Troubleshooting Datastore Inaccessibility Is Not Resolved for a VM When a datastore becomes inaccessible, VMCP might not terminate and restart the affected virtual machines. Problem When an All Paths Down (APD) or Permanent Device Loss (PDL) failure occurs and a datastore becomes inaccessible, VMCP might not resolve the issue for the affected virtual machines.
PAGE 52
Troubleshooting Resource Management 6 The resource management troubleshooting topics provide solutions to potential problems that you might encounter when using your hosts and datastores in vSphere DRS or vSphere Storage DRS cluster.
PAGE 53
vSphere Troubleshooting n The disk is a CD-ROM/ISO file. n If the disk is an independent disk, Storage DRS is disabled, except in the case of relocation or clone placement. n If the virtual machine has system files on a separate datastore from the home datastore (legacy), Storage DRS is disabled on the home disk.
PAGE 54
vSphere Troubleshooting n n If Storage DRS rules are preventing Storage DRS from making migration recommendations, you can remove or disable particular rules. a Browse to the datastore cluster in the vSphere Web Client object navigator. b Click the Manage tab and click Settings. c Under Configuration, select Rules and click the rule. d Click Remove.
PAGE 55
vSphere Troubleshooting Solution n The datastore must be visible in only one data center. Move the hosts to the same data center or unmount the datastore from hosts that reside in other data centers. n Ensure that all hosts associated with the datastore cluster are ESXi 5.0 or later. n Ensure that all hosts associated with the datastore cluster have Storage I/O Control enabled.
PAGE 56
vSphere Troubleshooting Storage DRS is Enabled on a Virtual Machine Deployed from an OVF Template Storage DRS is enabled on a virtual machine that was deployed from an OVF template that has Storage DRS disabled. This can occur when you deploy an OVF template on a datastore cluster. Problem When you deploy an OVF template with Storage DRS disabled on a datastore cluster, the resulting virtual machine has Storage DRS enabled.
PAGE 57
vSphere Troubleshooting Problem When you remove a virtual machine from a datastore cluster, and that virtual machine is subject to an affinity or anti-affinity rule in a datastore cluster, the rule remains. This allows you to store virtual machine configurations in different datastore clusters. If the virtual machine is moved back into the datastore cluster, the rule is applied. You cannot delete the rule after you remove the virtual machine from the datastore cluster.
PAGE 58
vSphere Troubleshooting Applying Storage DRS Recommendations Fails Storage DRS generates space or I/O load balancing recommendations, but attempts to apply the recommendations fail. Problem When you apply Storage DRS recommendations for space or I/O load balancing, the operation fails. Cause The following scenarios can prevent you from applying Storage DRS recommendations.
PAGE 59
vSphere Troubleshooting Unmanaged Workload Detected on Datastore In the vSphere Web Client, an alarm is triggered when vCenter Server detects that a workload from a host might be affecting performance. Problem The alarm Unmanaged workload is detected on the datastore is triggered. Cause The array is shared with non-vSphere workloads, or the array is performing system tasks such as replication. Solution There is no solution.
PAGE 60
vSphere Troubleshooting Cause The following reasons might prevent you from enabling Storage I/O Control on a datastore. n At least one host that is connected to the datastore is not running ESX/ESXi 4.1 or later. n You do not have the appropriate license to enable Storage I/O Control. Solution n Verify that the hosts connected to the datastore are ESX/ESXi 4.1 or later. n Verify that you have the appropriate license to enable Storage I/O Control. VMware, Inc.
PAGE 61
Troubleshooting Storage 7 The storage troubleshooting topics provide solutions to potential problems that you might encounter when using vSphere in different storage environments that include SAN, vSAN, or Virtual Volumes.
PAGE 62
vSphere Troubleshooting Table 7‑1. Troubleshooting Fibre Channel LUN Display Troubleshooting Task Description Check cable connectivity. If you do not see a port, the problem could be cable connectivity. Check the cables first. Ensure that cables are connected to the ports and a link light indicates that the connection is good. If each end of the cable does not show a good link light, replace the cable. Check zoning.
PAGE 63
vSphere Troubleshooting Table 7‑2. Troubleshooting iSCSI LUN Display (Continued) Troubleshooting Task Description Check access control configuration. If the expected LUNs do not appear after rescan, access control might not be configured correctly on the storage system side: n If CHAP is configured, ensure that it is enabled on the ESXi host and matches the storage system setup. n If IP-based filtering is used, ensure that the iSCSI HBA or the VMkernel port group IP address is allowed.
PAGE 64
vSphere Troubleshooting Problem Excessive SCSI reservations cause performance degradation and SCSI reservation conflicts. Cause Several operations require VMFS to use SCSI reservations.
PAGE 65
vSphere Troubleshooting Problem Your host is unable to access a LUN, or access is very slow. The host's log files might indicate frequent path state changes. For example: Frequent path state changes are occurring for path vmhba2:C0:T0:L3. This may indicate a storage problem. Affected device: naa.600600000000000000edd1. Affected datastores: ds1 Cause The problem might be caused by path thrashing.
PAGE 66
vSphere Troubleshooting Solution 1 If the sum of active commands from all virtual machines consistently exceeds the LUN depth, increase the queue depth. The procedure that you use to increase the queue depth depends on the type of storage adapter the host uses. 2 When multiple virtual machines are active on a LUN, change the Disk.SchedNumReqOutstanding (DSNRO) parameter, so that it matches the queue depth value.
PAGE 67
vSphere Troubleshooting 2 Adjust the queue depth for the appropriate module. esxcli --server=server_name system module parameters set -p parameter=value -m module Use the following strings for the parameter and module options.
PAGE 68
vSphere Troubleshooting 3 Verify your changes by running the esxcli --server=server_name system module parameters list -m iscsi_vmk command. The following output shows the queue depth for software iSCSI. iscsivmk_LunQDepth int 64 Maximum Outstanding Commands Per LUN Caution Setting the queue depth to a value higher than the default can decrease the total number of LUNs supported. Change the Outstanding IO Requests Setting If you adjusted the LUN queue depth, change the Disk.
PAGE 69
vSphere Troubleshooting Cause This behavior might be caused by cached SCSI INQUIRY data that interferes with specific guest operating systems and applications. When the ESXi host first connects to a target storage device on a SAN, it issues the SCSI INQUIRY command to obtain basic identification data from the device. By default, ESXi caches the received SCSI INQUIRY data (Standard, page 80, and page 83) and the data remains unchanged afterwards.
PAGE 70
vSphere Troubleshooting Cause ESXi supports the use of non-ASCII characters for directory and filenames on NFS storage, so you can create datastores and virtual machines using names in international languages. However, when the underlying NFS server does not offer internationalization support, unpredictable failures might occur. Solution Always make sure that the underlying NFS server offers internationalization support. If the server does not, use only ASCII characters.
PAGE 71
vSphere Troubleshooting Problem You can check metadata consistency when you experience problems with a VMFS datastore or a virtual flash resource. For example, perform a metadata check if one of the following occurs: n You experience storage outages. n After you rebuild RAID or perform a disk replacement. n You see metadata errors in the vmkernel.log file similar to the following: cpu11:268057)WARNING: HBX: 599: Volume 50fd60a3-3aae1ae2-3347-0017a4770402 ("") may be damaged on disk.
PAGE 72
vSphere Troubleshooting The output lists possible errors. For example, the following output indicates that the heartbeat address is invalid. XXXXXXXXXXXXXXXXXXXXXXX Phase 2: Checking VMFS heartbeat region ON-DISK ERROR: Invalid HB address Phase 3: Checking all file descriptors. Phase 4: Checking pathname and connectivity. Phase 5: Checking resource reference counts. Total Errors Found: 1 Command options that the VOMA tool takes include the following. Table 7‑4.
PAGE 73
vSphere Troubleshooting Problem Typically, when a storage path experiences problems, an ESXi host sends the Test Unit Ready (TUR) command to confirm that the path is down before initiating a path failover. However, if the TUR command is unsuccessful and repeatedly returns a retry operation request (VMK_STORAGE_RETRY_OPERATION), the host continues to retry the command without triggering the failover.
PAGE 74
vSphere Troubleshooting Troubleshooting Flash Devices vSphere uses flash drives for such storage features as vSAN, host swap cache, and Flash Read Cache. The troubleshooting topics can help you avoid potential problems and provide solutions for issues that you might encounter when configuring flash drives. Formatted Flash Devices Might Become Unavailable A local flash device becomes unavailable for virtual flash resource or Virtual SAN configuration when it is formatted with VMFS or any other file system.
PAGE 75
vSphere Troubleshooting Problem By default, auto-partitioning deploys VMFS file systems on any unused local storage disks on your host, including flash disks. However, a flash disk formatted with VMFS becomes unavailable for such features as virtual flash and vSAN. Both features require an unformatted flash disk and neither can share the disk with any other file system.
PAGE 76
vSphere Troubleshooting Mark Storage Devices as Flash If ESXi does not recognize its devices as flash, mark them as flash devices. ESXi does not recognize certain devices as flash when their vendors do not support automatic flash disk detection. The Drive Type column for the devices shows HDD as their type. Caution Marking the HDD devices as flash might deteriorate the performance of datastores and services that use them. Mark the devices only if you are certain that they are flash devices.
PAGE 77
vSphere Troubleshooting 5 Click Mark as Local, and click Yes to save your changes. Troubleshooting Virtual Volumes Virtual volumes are encapsulations of virtual machine files, virtual disks, and their derivatives. Virtual volumes are stored natively inside a storage system that is connected through Ethernet or SAN. They are exported as objects by a compliant storage system and are managed entirely by hardware on the storage side.
PAGE 78
vSphere Troubleshooting Cause This problem might occur when you fail to configure protocol endpoints for the SCSI-based storage container that is mapped to the virtual datastore. Like traditional LUNs, SCSI protocol endpoints need to be configured so that an ESXi host can detect them. Solution Before creating virtual datastores for SCSI-based containers, make sure to configure protocol endpoints on the storage side.
PAGE 79
vSphere Troubleshooting Failed Attempts to Migrate VMs with Memory Snapshots to and from Virtual Datastores When you attempt to migrate a VM with hardware version 10 or earlier to and from a vSphere Virtual Volumes datastore, failures occur if the VM has memory snapshots. Problem The following problems occur when you migrate a version 10 or earlier VM with memory snapshots: n Migration of a version 10 or earlier VM with memory snapshots to a virtual datastore is not supported and causes a failure.
PAGE 80
vSphere Troubleshooting Troubleshooting VAIO Filters vSphere APIs for I/O Filtering (VAIO) provide a framework that allows third parties to create software components called I/O filters. The filters can be installed on ESXi hosts and can offer additional data services to virtual machines by processing I/O requests that move between the guest operating system of a virtual machine and virtual disks. For information about I/O filters, see the see the vSphere Storagepublication.
PAGE 81
vSphere Troubleshooting Procedure 1 Install the VIBs by running the following command: esxcli --server=server_name software vib install --depot path_to_VMware_vib_ZIP_file Options for the install command allow you to perform a dry run, specify a specific VIB, bypass acceptance-level verification, and so on. Do not bypass verification on production systems. See the vSphere Command-Line Interface Reference documentation. 2 Verify that the VIBs are installed on your ESXi host.
PAGE 82
Troubleshooting Networking 8 The troubleshooting topics about networking in vSphere provide solutions to potential problems that you might encounter with the connectivity of ESXi hosts, vCenter Server and virtual machines. This chapter includes the following topics: n Troubleshooting MAC Address Allocation n The Conversion to the Enhanced LACP Support Fails n Unable to Remove a Host from a vSphere Distributed Switch n Hosts on a vSphere Distributed Switch 5.
PAGE 83
vSphere Troubleshooting Duplicate MAC Addresses of Virtual Machines on the Same Network You encounter loss of packets and connectivity because virtual machines have duplicate MAC addresses generated by vCenter Server. Problem The MAC addresses of virtual machines on the same broadcast domain or IP subnet are in conflict, or vCenter Server generates a duplicate MAC address for a newly created virtual machine.
PAGE 84
vSphere Troubleshooting n If the vCenter Server instance generates the MAC addresses of virtual machines according to the default allocation, VMware OUI, change the vCenter Server instance ID or use another allocation method to resolve conflicts. Note Changing the vCenter Server instance ID or switching to a different allocation scheme does not resolve MAC address conflicts in existing virtual machines.
PAGE 85
vSphere Troubleshooting n Enforce MAC address regeneration when transferring a virtual machine between vCenter Server instances by using the virtual machine files from a datastore. a Power off a virtual machine, remove it from the inventory, and in its configuration file (.vmx), set the ethernetX.addressType parameter to generated. X next to ethernet stands for the sequence number of the virtual NIC in the virtual machine.
PAGE 86
vSphere Troubleshooting Problem In the vSphere Web Client, after you assign a MAC address within the range 00:50:56:40:YY:ZZ – 00:50:56:7F:YY:ZZ to a virtual machine, attempts to power the virtual machine on fail with a status message that the MAC address is in conflict. 00:50:56:XX:YY:ZZ is not a valid static Ethernet address. It conflicts with VMware reserved MACs for other usage.
PAGE 87
vSphere Troubleshooting Table 8‑1. Steps to Complete the Conversion to the Enhanced LACP Manually Conversion Stage Target Configuration State Solution 1. Create a new LAG. A newly created LAG must be present on the distributed switch. Check the LACP configuration of the distributed switch and create a new LAG if there is none. 2. Create a an intermediate LACP teaming and failover configuration on the distributed port groups.
PAGE 88
vSphere Troubleshooting n Attempts to remove a host proxy switch that still exists on the host from a previous networking configuration fail. For example, you moved the host to a different data center or vCenter Server system, or upgraded the ESXi and vCenter Server software, and created new networking configuration. When trying to remove the host proxy switch, the operation fails because resources on the proxy switch are still in use.
PAGE 89
vSphere Troubleshooting Cause On a vSphere Distributed Switch 5.1 and later in vCenter Server that has networking rollback disabled, the port group containing the VMkernel adapters for the management network is misconfigured in vCenter Server and the invalid configuration is propagated to the hosts on the switch. Note In vSphere 5.1 and later, networking rollback is enabled by default. However, you can enable or disable rollbacks at the vCenter Server level.
PAGE 90
vSphere Troubleshooting 4 Apply the configuration of the distributed port group and VMkernel adapter from vCenter Server to the host. n Push the correct configuration of the distributed port group and VMkernel adapter from vCenter Server to the host. a In the vSphere Web Client, navigate to the host. b On the Configure tab, click Networking. c From the Virtual switches list, select the distributed switch and click Rectify the state of the selected distributed switch on the host.
PAGE 91
vSphere Troubleshooting 4 e In the Port Group Properties section, type a network label that identifies the port group that you are creating and optionally a VLAN ID. f Click Finish. In the vSphere Distributed Switch view, migrate the VMkernel adapter for the network to a standard switch. a Select the vSphere Distributed Switch view, and for the distributed switch, click Manage Virtual Adapters. b In the Manage Virtual Adapters wizard, select the VMkernel adapter from the list and click Migrate.
PAGE 92
vSphere Troubleshooting Solution Check which switch has lost uplink redundancy on the host. Connect at least one more physical NIC on the host to this switch and reset the alarm to green. You can use the vSphere Web Client or the ESXi Shell. If a physical NIC is down, try to bring it back up by using the ESXi Shell on the host. For information about using the networking commands in the ESXi Shell, see vSphere Command-Line Interface Reference.
PAGE 93
vSphere Troubleshooting n Create a port group with identical settings, make it use the valid uplink number for the host, and migrate the virtual machine networking to the port group. n Move the NIC to an uplink that participates in the active failover group. You can use the vSphere Web Client to move the host physical NIC to another uplink. n n Use the Add and Manage Hosts wizard on the distributed switch. a Navigate to the distributed switch in the vSphere Web Client.
PAGE 94
vSphere Troubleshooting Solution 1 In the vSphere Web Client, navigate to the host. 2 On the Configure tab, expand the System group of settings. 3 Select Advanced System Settings and click Edit. 4 Type the physical adapters that you want to use outside the scope of Network I/O Control as a comma-separated list for the Net.IOControlPnicOptOut parameter. For example: vmnic2,vmnic3 5 Click OK to apply the changes. 6 In the vSphere Web Client, add the physical adapter to the distributed switch.
PAGE 95
vSphere Troubleshooting Solution u In the guest operating system, reset the interface to cause the passthrough network adapter to regain its valid MAC address. If the interface is configured to use DHCP for address assignment, the interface acquires an IP address automatically. For example, on a Linux virtual machine run the ifconfig console command. ifconfig ethX down ifconfig ethX up where X in ethX represents the sequence number of the virtual machine network adapter in the guest operating system.
PAGE 96
vSphere Troubleshooting If the teaming and failover policy of the port group contains more active uplinks, the BPDU traffic is moved to the adapter for the next active uplink. The new physical switch port becomes disabled, and more workloads become unable to exchange packets with the network. Eventually, almost all entities on the ESXi host might become unreachable.
PAGE 97
vSphere Troubleshooting n Protect the environment from DoS attacks in any case by activating the BPDU filter on the ESXi host or on the physical switch. n On a host running ESXi 4.1 Update 3, ESXi 5.0 Patch 04 and later 5.0 releases, and ESXi 5.1 Patch 01 and later, enable the Guest BPDU filter in one of the following ways and reboot the host: n In the Advanced System Settings table on the Configure tab for the host in the vSphere Web Client, set the Net.BlockGuestBPDU property to 1.
PAGE 98
vSphere Troubleshooting Solution n Increase the threshold in bytes at which Windows changes its behavior for UDP packets by modifying the registry of the Windows guest OS. a Locate the HKLM\System\CurrentControlSet\Services\Afd\Parameters registry key. b Add a value with the name FastSendDatagramThreshold of type DWORD equal to 1500. For information about fixing this issue in the Windows registry, see http://support.microsoft.com/kb/235257. n Modify the coalescing settings of the virtual machine NIC.
PAGE 99
vSphere Troubleshooting Action Parameter in the vSphere Web Client Parameter for the esxcli system settings sdvanced set Command Value Set a default interrupt rate higher than the expected packet rate. For example, set the interrupt rate to 16000 if 15000 interrupts are expected per second. Net.CoalesceScheme /Net/CoalesceScheme rbc Net.CoalesceParams /Net/CoalesceParams 16000 Disable coalescing for low throughput or latency-sensitive workloads.
PAGE 100
vSphere Troubleshooting n In the topology of the distributed switch, check the VLAN IDs of the physical NICs that are assigned to the active uplinks on the distributed port group. On all hosts, assign physical NICs that are from the same VLAN to an active uplink on the distributed port group. n To verify that there is no problem at the physical layer, migrate the virtual machines to the same host and check the communication between them.
PAGE 101
vSphere Troubleshooting Solution n Create a network protocol profile on the target data center or vCenter Server system with the required network settings and associate the protocol profile with the port group to which the vApp or virtual machine is connected. For example, this approach is suitable if the vApp or virtual machine is a vCenter Server extension that uses the vCenter Extension vService.
PAGE 102
vSphere Troubleshooting Solution n Use the vSphere Web Client to increase the timeout for rollback on vCenter Server. If you encounter the same problem again, increase the rollback timeout with 60 seconds incrementally until the operation has enough time to succeed. a On the Configure tab of a vCenter Server instance, expand Settings. b Select Advanced Settings and click Edit. c If the property is not present, add the config.vpxd.network.rollbackTimeout parameter to the settings.
PAGE 103
Troubleshooting Licensing 9 The troubleshooting licensing topics provide solutions to problems that you might encounter as a result of an incorrect or incompatible license setup in vSphere. This chapter includes the following topics: n Troubleshooting Host Licensing n Unable to Power On a Virtual Machine n Unable to Configure or Use a Feature Troubleshooting Host Licensing You might encounter different problems that result from an incompatible or incorrect license configuration of ESXi hosts.
PAGE 104
vSphere Troubleshooting Solution n Assign a license with larger capacity. n Upgrade the license edition to match the resources and features on the host, or disable the features that do not match the license edition. n Assign a vSphere license whose edition is compatible with the license edition of vCenter Server. ESXi Host Disconnects from vCenter Server An ESXi host might disconnect from vCenter Server or all ESXi hosts might disconnect from vCenter Server at the same time.
PAGE 105
vSphere Troubleshooting Solution Table 9‑1. Power on a Virtual Machine Cause Solution The evaluation period of the host is expired Assign a vSphere license to the ESXi host The license of the host is expired Assign a vSphere license to the ESXi host Unable to Configure or Use a Feature You cannot use a feature or change its configuration. Problem You cannot use or configure a feature and a licensing-related error message appears.