Details
- Because one of the prerequisites for IRQ sharing issues is that devices be owned by the service console, ESXi does not experience this issue as it does not have a service console.
- This is not an issue in ESX 4.0 as IRQs are loaded directly into VMkernel (so no conflicts can occur in the service console).
Solution
Understanding IRQ Sharing in ESX Server
Note: This article is intended to assist you to identify possible IRQ sharing issues, and potential solutions. For detailed information on IRQ and IRQ specifications, please contact the system vendor.
In ESX Server 3.5, devices can be owned either by the service console or by VMkernel. Interrupt processing overhead for devices owned by the service console is much higher than for devices owned by VMkernel because of an extra context switch needed for interrupt processing. Because of this, devices with high interrupt rates should not be assigned to the service console.
There is little effect on performance when IRQ sharing occurs between two low-interrupt rate devices and when both are owned by the service console or when both are owned by VMkernel. However, in cases when one of the IRQ sharing devices is owned by the service console and the other is owned by VMkernel, there can be significant performance impact. The impact is more severe (and easily observed) when the interrupt rates on either of the devices are high.
The performance impact is due to two reasons:
- Interrupt lines shared between the service console and VMkernel result in higher overheads due to extra context switches. When a shared interrupt is issued, the VMkernel has no direct way of determining which device caused the interrupt. The CPU then runs all interrupt service routines sequentially for all devices using that interrupt until it finds the device that caused the interrupt. When the shared devices are owned by the VMkernel, running this chain of interrupt routines does not take much time. However, in the case when IRQs are shared among VMkernel and the service console, executing the sequence of interrupt routines results in context switches on each interrupt. This has a significant performance impact.
- IRQ sharing limits interrupt processing to a single CPU. ESX Server was designed to make full use of the available hardware resources for optimal performance. Under normal conditions, interrupt processing is fanned out to different cores on the system, selecting the core that is least utilized. However, when the IRQ for a device in VMkernel is shared with that of a device owned by the service console, this interrupt processing gets limited to CPU #0. In devices with high interrupt rates, CPU #0 becomes a processing bottleneck. This problem is further aggravated by the fact that the service console also runs on CPU #0, even though its resource consumption is minimal.
Determining if IRQ Sharing Issues Affect Your System's Performance
The tell-tale sign of IRQ sharing between VMkernel and the service console is a high number of interrupts being serviced by PCPU0 (CPU #0 on the physical host) while the other CPUs are relatively lightly loaded. The high interrupt rates might sometimes render the service console unusable and cause a high variation in ESX Server performance.
The most common service console device that causes IRQ sharing is the USB controller. On the VMkernel side, the network and storage controllers are susceptible to IRQ sharing. IRQ sharing is more common in dual- and quad-port NICs and storage HBAs than in single port controllers. However, IRQ sharing is not restricted only to these devices. The effect is more visible under I/O intensive loads (with high interrupt rates). The problem might manifest itself as variation in performance or as an absolute drop in ESX Server performance.
To determine if your setup suffers from IRQ sharing, list the IRQ assignment in VMkernel by typing the following at the command-line of the service console:
> cat /proc/vmware/interrupts
This lists the interrupt usage. The output looks similar to the following example:
> cat /proc/interrupts
This lists the IRQ lines assigned to the devices in the service console.
You can see that both IRQ 18 and IRQ 19 are in use by usb-uhci, a USB Universal Host Controller device.
Resolving the IRQ Sharing Issue
To resolve IRQ sharing conflicts:
- Disable the problematic device, if it is not used.
- Move the device to a different PCI slot.
- Coalesce processing service console device interrupts (Only in ESX 3.5 Update 5)
Note: Please check with your hardware vendor if the USB device is used by any of your Remote Access card before disabling it.
Disabling the Device
- Preventing the service console from loading the driver for the device also resolves interrupt sharing. To prevent the service console from loading the driver for the device, remove the references to the driver from the file /etc/modules.conf.
- Disabling USB devices from the BIOS itself (for certain systems). Disabling USB controllers in the BIOS also prevents the USB drivers from loading on subsequent reboot cycles.
- On hardware known to have an interrupt sharing problem, installing ESXi Server instead of ESX Server avoids the interrupt issue.
| Note: |
|
The output of /proc/interrupts is now:
CPU0
Moving the Device
- Click the Configuration tab.
- Click Advanced Settings.
- View or set Irq.IRQNumHostPend.
- esxcfg-advcfg --get /Irq/IRQNumHostPend
- esxcfg-advcfg --set <value> /Irq/IRQNumHostPend
