Pegasus (cimserver) memory leaks reported in ESX 3.5 Update 2 and later

Details

An ESX host displays a purple diagnostic screen with an error similar error to:

0:08:04:55.679 cpu0:1024)VMNIX: ALERT: HB: 365: Lost heartbeat (comm=cimserver
pid=3588 t=30 to=30 clt=1).
0:08:04:55.818 cpu0:1024)Host: 4781: COS Error: Lost heartbeat
Starting coredump to disk Starting coredump to disk Dumping using slot 1 of 1...
using slot 1 of 1... Log
The ESX host may be slow to respond on Console-based network management connections.
The ESX host may disconnect from VirtualCenter or vCenter Server.

Solution

Lost heartbeat errors can have many different root causes.

Starting with ESX 3.5 Update 2 and later, memory leaks are reported in Pegasus (cimserver). Over time, memory leaks lead to the cimserver process occupying the available memory and swap space. Some memory leaks might appear on specific hardware models or might depend on how frequently the CIM server is queried. This issue causes the service console to fail, causing ESX to fail with a Lost Heartbeat error message.

Note: This issue affects ESX 3.5 Update 2, Update 3, Update 4 and Update 5. All other versions of ESX (including ESXi) are not affected. This issue does not affect ESX 4.0 and ESXi 4.0 as Pegasus component is not used.

To workaround this issue, periodically restart the Pegasus service process so that any excessive memory which is being used is freed.

To schedule a daily service restart at midnight:

Log into the ESX host as root at the console or via SSH. For more information, see Unable to connect to an ESX host using Secure Shell (SSH) (1003807).
At the root shell prompt, run the following command to edit the root crontab:

crontab –e

Note: This opens a vi editor.
Add the following line:

0 0 * * * /etc/init.d/pegasus restart

Note: Use crontab as follows to specify another run time:

*     *     *   *    *        command to be executed
-     -     -   -    -
|     |     |   |    |
|     |     |   |    +----- day of week (0 - 6) (Sunday=0)
|     |     |   +------- month (1 - 12)
|     |     +--------- day of        month (1 - 31)
|     +----------- hour (0 - 23)
+------------- min (0 - 59)

Save and quit the crontab editor by typing:

wq
VMware recommends that you monitor cimserver's memory usage and adjust the cronjob to run accordingly. You can monitor the size of the cimserver process with the command ps -A --no-headers --format fname,rss --sort=-rss | grep cimserve or with the top command. You may adjust the cronjob frequency to restart Pegasus more or less frequently, depending on how fast it leaks on your system.

Note: In VMware Infrastructure Client, the Health Status in the Configuration tab might not display any information during the restart of the CIM server.
Update History
04/09/2010 - Updated command to monitor the size of the cimserver process. 04/29/2040 - Described how to use crontab to specify other run times.
Based on VMware KB 1009607

Knowledgebase

Categories

Categories

Details

Solution

Update History

Related Articles

Support

Knowledgebase

Categories

Categories