Pegasus (cimserver) memory leaks reported in ESX 3.5 Update 2 and later

Details

  • An ESX host displays a purple diagnostic screen with an error similar error to:

    0:08:04:55.679 cpu0:1024)VMNIX: ALERT: HB: 365: Lost heartbeat (comm=cimserver
    pid=3588 t=30 to=30 clt=1).
    0:08:04:55.818 cpu0:1024)Host: 4781: COS Error: Lost heartbeat
    Starting coredump to disk Starting coredump to disk Dumping using slot 1 of 1...
    using slot 1 of 1... Log

  • The ESX host may be slow to respond on Console-based network management connections.
  • The ESX host may disconnect from VirtualCenter or vCenter Server.

Solution

Lost heartbeat errors can have many different root causes.

Starting with ESX 3.5 Update 2 and later, memory leaks are reported in Pegasus (cimserver). Over time, memory leaks lead to the cimserver process occupying the available memory and swap space. Some memory leaks might appear on specific hardware models or might depend on how frequently the CIM server is queried. This issue causes the service console to fail, causing ESX to fail with a Lost Heartbeat error message.
 
Note: This issue affects ESX 3.5 Update 2, Update 3, Update 4 and Update 5. All other versions of ESX (including ESXi) are not affected. This issue does not affect ESX 4.0 and ESXi 4.0 as Pegasus component is not used. 
 
To workaround this issue, periodically restart the Pegasus service process so that any excessive memory which is being used is freed.
 
To schedule a daily service restart at midnight:
  1. Log into the ESX host as root at the console or via SSH. For more information, see Unable to connect to an ESX host using Secure Shell (SSH) (1003807).
  2. At the root shell prompt, run the following command to edit the root crontab:

    crontab –e

    Note: This opens a vi editor.
    1. Add the following line:

      0 0 * * * /etc/init.d/pegasus restart

      NoteUse crontab as follows to specify another run time:

      *     *     *   *    *        command to be executed
      -     -     -   -    -
      |     |     |   |    |
      |     |     |   |    +----- day of week (0 - 6) (Sunday=0)
      |     |     |   +------- month (1 - 12)
      |     |     +--------- day of        month (1 - 31)
      |     +----------- hour (0 - 23)
      +------------- min (0 - 59)

    2. Save and quit the crontab editor by typing:

      wq
    VMware recommends that you monitor cimserver's memory usage and adjust the cronjob to run accordingly. You can monitor the size of the cimserver process with the command ps -A --no-headers --format fname,rss --sort=-rss | grep cimserve or with the top command. You may adjust the cronjob frequency to restart Pegasus more or less frequently, depending on how fast it leaks on your system.
     
    Note: In VMware Infrastructure Client, the Health Status in the Configuration tab might not display any information during the restart of the CIM server.

    Update History

    04/09/2010 - Updated command to monitor the size of the cimserver process. 04/29/2040 - Described how to use crontab to specify other run times.
    Based on VMware KB 1009607

  • 1 Users Found This Useful
Was this answer helpful?

Related Articles

Hardware and firmware requirements for 64-bit guest operating systems

PurposeThis article explains the host machine hardware and firmware requirements for installing...

Logging in to the vCenter Server 5.0 Web Client fails with the error: unable to connect to vCenter Inventory Service

DetailsAfter upgrading from vCenter Server 4.1 to 5.0, you experience these symptoms:Cannot log...

Multiple network entries in vCenter Server 5.0.x after migrating virtual machines from a virtual switch to a virtual distributed switch

SymptomsAfter migrating virtual machines from a virtual switch to a virtual Distributed...

Minimum requirements for the VMware vCenter Server 5.x Appliance

PurposeIf you are using the VMware vCenter Server Appliance, beginning with vSphere 5.0 you can...