Details
- Because one of the prerequisites for IRQ sharing issues is that devices be owned by the service console, ESXi does not experience this issue as it does not have a service console.
- This is not an issue in ESX 4.0 as IRQs are loaded directly into VMkernel (so no conflicts can occur in the service console).
Solution
Understanding IRQ Sharing in ESX Server
Note: This article is intended to assist you to identify possible IRQ sharing issues, and potential solutions. For detailed information on IRQ and IRQ specifications, please contact the system vendor.
In ESX Server 3.5, devices can be owned either by the service console or by VMkernel. Interrupt processing overhead for devices owned by the service console is much higher than for devices owned by VMkernel because of an extra context switch needed for interrupt processing. Because of this, devices with high interrupt rates should not be assigned to the service console.
There is little effect on performance when IRQ sharing occurs between two low-interrupt rate devices and when both are owned by the service console or when both are owned by VMkernel. However, in cases when one of the IRQ sharing devices is owned by the service console and the other is owned by VMkernel, there can be significant performance impact. The impact is more severe (and easily observed) when the interrupt rates on either of the devices are high.
The performance impact is due to two reasons:
- Interrupt lines shared between the service console and VMkernel result in higher overheads due to extra context switches. When a shared interrupt is issued, the VMkernel has no direct way of determining which device caused the interrupt. The CPU then runs all interrupt service routines sequentially for all devices using that interrupt until it finds the device that caused the interrupt. When the shared devices are owned by the VMkernel, running this chain of interrupt routines does not take much time. However, in the case when IRQs are shared among VMkernel and the service console, executing the sequence of interrupt routines results in context switches on each interrupt. This has a significant performance impact.
- IRQ sharing limits interrupt processing to a single CPU. ESX Server was designed to make full use of the available hardware resources for optimal performance. Under normal conditions, interrupt processing is fanned out to different cores on the system, selecting the core that is least utilized. However, when the IRQ for a device in VMkernel is shared with that of a device owned by the service console, this interrupt processing gets limited to CPU #0. In devices with high interrupt rates, CPU #0 becomes a processing bottleneck. This problem is further aggravated by the fact that the service console also runs on CPU #0, even though its resource consumption is minimal.
Determining if IRQ Sharing Issues Affect Your System's Performance
The tell-tale sign of IRQ sharing between VMkernel and the service console is a high number of interrupts being serviced by PCPU0 (CPU #0 on the physical host) while the other CPUs are relatively lightly loaded. The high interrupt rates might sometimes render the service console unusable and cause a high variation in ESX Server performance.
The most common service console device that causes IRQ sharing is the USB controller. On the VMkernel side, the network and storage controllers are susceptible to IRQ sharing. IRQ sharing is more common in dual- and quad-port NICs and storage HBAs than in single port controllers. However, IRQ sharing is not restricted only to these devices. The effect is more visible under I/O intensive loads (with high interrupt rates). The problem might manifest itself as variation in performance or as an absolute drop in ESX Server performance.
To determine if your setup suffers from IRQ sharing, list the IRQ assignment in VMkernel by typing the following at the command-line of the service console:
> cat /proc/vmware/interrupts
This lists the interrupt usage. The output looks similar to the following example:
Vector PCPU 0 PCPU 1 PCPU 2 PCPU 3
0x21: 0 0 0 0 VMK ACPI Interrupt
0x29: 1 0 0 0 <COS irq 1 (ISA edge)>, VMK keyboard
0x31: 4 0 0 0 <COS irq 3 (ISA edge)>
0x39: 4 0 0 0 <COS irq 4 (ISA edge)>
0x41: 0 0 0 0 <COS irq 6 (ISA edge)>
0x49: 0 0 0 0 <COS irq 7 (ISA edge)>
0x51: 0 0 0 0 <COS irq 8 (ISA edge)>
0x59: 0 0 0 0 <COS irq 12 (ISA edge)>
0x61: 0 0 0 0 <COS irq 13 (ISA edge)>
0x69: 43762 0 0 0 COS irq 14 (ISA edge)
0x71: 0 0 0 0 <COS irq 15 (ISA edge)>
0x79: 7917 542 3583 8292 <COS irq 16 (PCI level)>, VMK aic79xx
0x81: 1544 0 0 0 COS irq 17 (PCI level), VMK aic79xx
0x89: 1212177 0 0 0 COS irq 19 (PCI level), VMK vmnic1
0x91: 90997 0 0 0 COS irq 18 (PCI level), VMK vmnic0
0x99: 152 0 0 0 <COS irq 20 (PCI level)>, VMK qla2300
0xdf: 8904447 10869326 11006262 10844861 VMK timer
0xe1: 74 5582 12007 16005 VMK monitor
0xe9: 60854 443843 477389 509724 VMK resched
0xec: 0 0 0 0 VMK ucodeUpdate
0xf1: 3 40 68 100 VMK tlb
0xf9: 243265 0 0 0 VMK noop
0xfc: 0 0 0 0 VMK thermal
0xfd: 0 0 0 0 VMK lint1
0xfe: 0 0 0 0 VMK error
0xff: 0 0 0 0 VMK spurious
> cat /proc/interrupts
This lists the IRQ lines assigned to the devices in the service console.
CPU0
0: 1744046 vmnix-edge timer
1: 3 vmnix-edge keyboard
2: 163071 vmnix-edge VMnix interrupt
14: 44022 vmnix-edge ide0
17: 1499 vmnix-level usb-uhci, ehci-hcd
18: 91236 vmnix-level usb-uhci
19: 1303228 vmnix-level usb-uhci
NMI: 0
LOC: 0
ERR: 0
MIS: 0
You can see that both IRQ 18 and IRQ 19 are in use by usb-uhci, a USB Universal Host Controller device.
Resolving the IRQ Sharing Issue
To resolve IRQ sharing conflicts:
- Disable the problematic device, if it is not used.
- Move the device to a different PCI slot.
- Coalesce processing service console device interrupts (Only in ESX 3.5 Update 5)
Note: Please check with your hardware vendor if the USB device is used by any of your Remote Access card before disabling it.
Disabling the Device
- Preventing the service console from loading the driver for the device also resolves interrupt sharing. To prevent the service console from loading the driver for the device, remove the references to the driver from the file /etc/modules.conf.
- Disabling USB devices from the BIOS itself (for certain systems). Disabling USB controllers in the BIOS also prevents the USB drivers from loading on subsequent reboot cycles.
- On hardware known to have an interrupt sharing problem, installing ESXi Server instead of ESX Server avoids the interrupt issue.
Note: |
|
The output of /proc/interrupts is now:
CPU0
0: 8690704 vmnix-edge timer
1: 3 vmnix-edge keyboard
2: 2238513 vmnix-edge VMnix interrupt
14: 212504 vmnix-edge ide0
17: 1715 vmnix-level ehci-hcd
NMI: 0
LOC: 0
ERR: 0
MIS: 0
Vector PCPU 0 PCPU 1 PCPU 2 PCPU 3
...
0x71: 0 0 0 0 <COS irq 15 (ISA edge)>
0x79: 32895 36823 68596 74985 <COS irq 16 (PCI level)>,VMK aic79xx
0x81: 1760 0 0 0 COS irq 17 (PCI level), VMK aic79xx
0x89: 1596687 26796 534484 289717 <COS irq 19 (PCI level)>, VMK vmnic1
0x91: 344252 1035 1951 768 <COS irq 18 (PCI level)>, VMK vmnic0
0x99: 616 0 0 0 <COS irq 20 (PCI level)>,VMK qla2300
0xdf: 45701368 115529363 127721751 128342600 VMK timer
...
Moving the Device
- Click the Configuration tab.
- Click Advanced Settings.
- View or set Irq.IRQNumHostPend.
- esxcfg-advcfg --get /Irq/IRQNumHostPend
- esxcfg-advcfg --set <value> /Irq/IRQNumHostPend