Intermittent network and intermittent SAN issues occurring at the same time might result in conditions like unreachable SAN or high latencies. This situation might cause HA to power on the same virtual machine on a second ESX host, and leave this virtual machine registered on two different ESX hosts.
Normally if there are intermittent SAN issues, the ESX host can temporarily lose locks on the .vmdk files of the powered-on virtual machines. Usually, the ESX host times out and, after a while, reclaims the lock. The ESX host loses the lock only when the same virtual machine powers on, on another ESX host. This is an unusual occurrence. The lost lock problem happens only when there are network issues that make HA restart virtual machines on another host. So, if there are intermittent network and intermittent SAN issues, the virtual machine (which is running on the original ESX host that lost the lock to the same virtual machine currently started on the second ESX host) stops responding and becomes inaccessible, but shows as registered in both ESX hosts in VirtualCenter.
This issue is fixed in the following patches:
Product Version | Patch Name | KB Number |
|
ESX 3.5 | ESX350-200811401-SG | 1007501 | |
ESXi 3.5 | ESXe350-200811401-I-SG | 1007507 | |
ESX 3.0.3 | ESX303-200811401-BG | 1006986 | |
ESX 3.0.2 | ESX-1006980 | 1006980 | |
With this fix, the virtual machine running on the original ESX host displays a dialog box with a message that states the locks have been lost and the virtual machine will be shutting down automatically.
Note: 4.x products have this patch included.
Based on VMware KB 1006936