Details
- An ESX host displays a purple diagnostic screen with an error similar error to:
0:08:04:55.679 cpu0:1024)VMNIX: ALERT: HB: 365: Lost heartbeat (comm=cimserver
pid=3588 t=30 to=30 clt=1).
0:08:04:55.818 cpu0:1024)Host: 4781: COS Error: Lost heartbeat
Starting coredump to disk Starting coredump to disk Dumping using slot 1 of 1...
using slot 1 of 1... Log - The ESX host may be slow to respond on Console-based network management connections.
- The ESX host may disconnect from VirtualCenter or vCenter Server.
Solution
Lost heartbeat errors can have many different root causes.
Starting with ESX 3.5 Update 2 and later, memory leaks are reported in Pegasus (cimserver). Over time, memory leaks lead to the cimserver process occupying the available memory and swap space. Some memory leaks might appear on specific hardware models or might depend on how frequently the CIM server is queried. This issue causes the service console to fail, causing ESX to fail with a Lost Heartbeat error message.
Note: This issue affects ESX 3.5 Update 2, Update 3, Update 4 and Update 5. All other versions of ESX (including ESXi) are not affected. This issue does not affect ESX 4.0 and ESXi 4.0 as Pegasus component is not used.
To workaround this issue, periodically restart the Pegasus service process so that any excessive memory which is being used is freed.
To schedule a daily service restart at midnight:
- Log into the ESX host as root at the console or via SSH. For more information, see Unable to connect to an ESX host using Secure Shell (SSH) (1003807).
- At the root shell prompt, run the following command to edit the root crontab:
crontab –e
Note: This opens a vi editor.